> > - Cron syntax allows you to construct only absolute lookbacks (i.e. > "every tuesday at 3PM" not "every tuesday at the current hour")
I think Cron would work for this. I am no expert on cron expressions, but I think the following examples would work. - If you want "every Tuesday at 3 PM" - 0 0 15 ? * TUE * - If you want "every Tuesday at current hour" then use something like the "?" placeholder maybe. - 0 0 ? ? * TUE * - Cron syntax allows you to specify a point in time, not a duration. We > could, of course, specify a duration as another argument Yes, a separate argument would be necessary. We would have to allow the user to specify either a "start from date/time" or the "number of intervals to look back". Cron syntax does not allow you to skip things like holidays, etc. I agree, out-of-the-box Cron does not solve holiday calendars. But this would be a smaller problem to solve then creating our own DSL. There is a tradition of creating shortcuts that look something like @Daily or @Weekdays or @Tuesdays that we could also use to make things easier for users. I have used Quartz with cron expressions in the past and there was some way to handle holidays with that. I think you could create a custom calendar for the holidays and call it something; aka @USHolidays. And then you would say "every Tuesday" except @USHolidays or something like that. I'd have to look into this some more. And there are also nice online Cron expression "translators" that we could mimic in a Metron user interface. For example, https://crontab.guru. On Tue, Jan 31, 2017 at 12:00 PM, Casey Stella <ceste...@gmail.com> wrote: > I actually did consider cron initially but dismissed it for the following > reasons: > > - Cron syntax allows you to construct only absolute lookbacks (i.e. > "every tuesday at 3PM" not "every tuesday at the current hour") > - Cron syntax allows you to specify a point in time, not a duration. We > could, of course, specify a duration as another argument > - Cron syntax does not allow you to skip things like holidays, etc. > > You could use Cron syntax as part of a broader API to specify the days to > look back and have other arguments handle the aspects that cron doesn't > support out of the box. I share your concern at making another DSL, but > cron seemed to not be a complete solution and it's syntax, despite being > well known by admins, may not be well known to analysts. Also, and this is > just a personal bias, I find it inscrutable without a fair amount of > wikipedia and man page reading. > > On Tue, Jan 31, 2017 at 11:47 AM, Nick Allen <n...@nickallen.org> wrote: > > > I do prefer the flexibility of the DSL, but would prefer not to create > yet > > another DSL for our users to learn. Couldn't we somehow use cron > > expressions for this functionality? > > > > On Mon, Jan 23, 2017 at 3:01 PM, Casey Stella <ceste...@gmail.com> > wrote: > > > > > Hi All, > > > > > > I'm planning to expand the capabilities of PROFILE_GET and wanted to > pass > > > an idea past the community. > > > > > > *Current State* > > > > > > Currently, the functionality of PROFILE_GET is fairly straightforward: > > > > > > - profile - The name of the profile. > > > - entity - The name of the entity. > > > - durationAgo - How long ago should values be retrieved from? > > > - units - The units of 'durationAgo'. > > > - groups_list - Optional, must correspond to the 'groupBy' list used > > in > > > profile creation - List (in square brackets) of groupBy values used > to > > > filter the profile. Default is the empty list, meaning groupBy was > not > > > used > > > when creating the profile. > > > - config_overrides - Optional - Map (in curly braces) of name:value > > > pairs, each overriding the global config parameter of the same name. > > > Default is the empty Map, meaning no overrides. > > > > > > This has the advantage of providing a relatively simple mechanism to > > > support the dominant use-case, gathering the profiles for a trailing > > > window. The issues, however, are a couple: > > > > > > - We may need more complex semantics for specifying the window > > > (motivated below) > > > - As such, this couples the gathering of the profiles with the > > > specification of the window. > > > > > > I propose to decouple these two concepts. I propose that we extract the > > > notion of the lookback into a separate, more featureful function called > > > PROFILE_LOOKBACK() which could be composed with an adjusted > PROFILE_GET, > > > whose arguments look like: > > > > > > > > > - profile - The name of the profile. > > > - entity - The name of the entity. > > > - timestamps - The list of timestamps to retrieve > > > - groups_list - Optional, must correspond to the 'groupBy' list used > > in > > > profile creation - List (in square brackets) of groupBy values used > to > > > filter the profile. Default is the empty list, meaning groupBy was > not > > > used > > > when creating the profile. > > > - config_overrides - Optional - Map (in curly braces) of name:value > > > pairs, each overriding the global config parameter of the same name. > > > Default is the empty Map, meaning no overrides. > > > > > > So, PROFILE_GET would have the output of PROFILE_LOOKBACK passed to it > as > > > its 3rd argument (e.g. PROFILE_GET( 'my_profile', 'my_entity', > > > PROFILE_LOOKBACK(...)) ). > > > > > > *Motivation for Change* > > > > > > The justification for this is that sometimes you want to compare time > > bins > > > for a long duration back, but you don't want to skew the data by > > including > > > periods that aren't distributionally similar (due to seasonal data, for > > > instance). You might want to compare a value to statistically baseline > > of > > > the median of the values for the same time window on the same day for > the > > > last month (e.g. every tuesday at this time). > > > > > > Also, we might want a trailing window that does not start at the > current > > > time (in wall-clock), but rather starts an hour back or from the time > > that > > > the data was originally ingested. > > > > > > > > > *PROFILE_LOOKBACK* > > > > > > I propose that we support the following features: > > > > > > - A starting point that is not current time > > > - Sparse bins (i.e. the last hour for every tuesday for the last > > month) > > > - The ability to skip events (e.g. weekends, holidays) > > > > > > > > > This would result in a new function with the following arguments: > > > > > > - > > > > > > from - The lookback starting point (default to now) > > > - > > > > > > fromUnits - The units for the lookback starting point > > > - > > > > > > to - The ending point for the lookback window (default to from + > > > binSize) > > > - > > > > > > toUnits - The units for the lookback ending point > > > - > > > > > > including - A list of conditions which we would skip. > > > - weekend > > > - holiday > > > - sunday through saturday > > > - > > > > > > excluding - A list of conditions which we would skip. > > > - weekend > > > - holiday > > > - sunday through saturday > > > - > > > > > > binSize - The size of the lookback bin > > > - > > > > > > binUnits - The units of the lookback bin > > > > > > Given the number of arguments and their complexity and the fact that > > many, > > > many are optional, I propose that either > > > > > > - PROFILE_LOOKBACK take a Map so that we can get essentially named > > > params in stellar. > > > - PROFILE_LOOKBACK accept a string backed by a DSL to express these > > > criteria > > > > > > > > > Ok, so that's a lot to take in. How about we look at some motivating > > > use-cases. > > > > > > *Base Case: A lookback of 1 hour ago* > > > > > > As a map, this would look like: > > > > > > PROFILE_LOOKBACK( { 'binSize' : 1, 'binUnits' : 'HOURS' } ) > > > > > > As a DSL this would look like: > > > PROFILE_LOOKBACK( '1 hour bins from now') > > > > > > > > > *The same time window every tuesday for the last month starting one > hour > > > ago* > > > > > > Just to make this as clear as possible, if this is run at 3PM on Monday > > > January 23rd, 2017, it would include the following bins: > > > > > > - January 17th, 2PM - 3PM > > > - January 10th, 2PM - 3PM > > > - January 3rd, 2PM - 3PM > > > - December 27th, 2PM - 3PM > > > > > > As a map, this would look like: > > > > > > PROFILE_LOOKBACK( { 'from' : 1, 'fromUnits' : 'HOURS', 'to' : 1, > > 'toUnits' > > > : 'MONTH', 'including' : [ 'tuesday' ], 'binSize' : 1, 'binUnits' : > > 'HOURS' > > > } ) > > > > > > As a DSL this would look like: > > > PROFILE_LOOKBACK( '1 hour bins from 1 hour to 1 month including > > tuesdays') > > > > > > *The same time window every sunday for the last month starting one hour > > ago > > > skipping holidays* > > > > > > Just to make this as clear as possible, if this is run at 3PM on Monday > > > January 22rd, 2017, it would include the following bins: > > > > > > - January 16th, 2PM - 3PM > > > - January 9th, 2PM - 3PM > > > - January 2rd, 2PM - 3PM > > > - NOT December 25th > > > > > > As a map, this would look like: > > > > > > PROFILE_LOOKBACK( { 'from' : 1, 'fromUnits' : 'HOURS', 'to' : 1, > > 'toUnits' > > > : 'MONTH', 'including' : [ 'tuesday'], 'excluding' : [ 'holidays' ], > > > 'binSize' : 1, 'binUnits' : 'HOURS' } ) > > > > > > As a DSL this would look like: > > > PROFILE_LOOKBACK( '1 hour bins from 1 hour to 1 month including > tuesdays > > > excluding holidays') > > > > > > *DSL vs API* > > > > > > So, here's my personal rundown of the two approaches: > > > > > > DSL: > > > > > > - PRO > > > - Clear. As you can see, it reads like a sentence > > > - Concise > > > - CON: > > > - More complex to implement > > > - Another DSL to learn > > > > > > API: > > > > > > - PRO > > > - Simpler to implement (though marginally so, IMO) > > > - CON > > > - A bit more complex to understand (also, IMO) > > > > > > I'd like to solicit feedback from the community at this point: > > > > > > - What do you think of this change? > > > - Would you prefer the DSL, API or other approach? > > > > > > Thanks, > > > > > > Casey > > > > > > > > > > > -- > > Nick Allen <n...@nickallen.org> > > > -- Nick Allen <n...@nickallen.org>