[
https://issues.apache.org/jira/browse/METRON-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877083#comment-15877083
]
ASF GitHub Bot commented on METRON-690:
---------------------------------------
Github user nickwallen commented on a diff in the pull request:
https://github.com/apache/incubator-metron/pull/450#discussion_r102355664
--- Diff: metron-analytics/metron-profiler-client/README.md ---
@@ -91,37 +60,268 @@ want to change the global Client configuration so as
not to disrupt the work of
| profiler.client.salt.divisor | The salt divisor used to store
profile data.
| Optional | 1000 |
| hbase.provider.impl | The name of the
HBaseTableProvider implementation class.
| Optional | |
+
+### Profile Selectors
+
+You will notice that the third argument for `PROFILE_GET` is a list of
`ProfilePeriod` objects. This list is expected to
+be produced by another Stellar function. There are a couple options
available.
+
+#### `PROFILE_FIXED`
+
+The profiler periods associated with a fixed lookback starting from now.
These are ProfilePeriod objects.
+```
+REQUIRED:
+ durationAgo - How long ago should values be retrieved from?
+ units - The units of 'durationAgo'.
+OPTIONAL:
+ config_overrides - Optional - Map (in curly braces) of name:value
pairs, each overriding the global config parameter
+ of the same name. Default is the empty Map, meaning no
overrides.
+
+e.g. To retrieve all the profiles for the last 5 hours.
PROFILE_GET('profile', 'entity', PROFILE_FIXED(5, 'HOURS'))
+```
+
+Note that the `config_overrides` parameter operates exactly as the
`config_overrides` argument in `PROFILE_GET`.
+The only available parameters for override are:
+* `profiler.client.period.duration`
+* `profiler.client.period.duration.units`
+
+#### `PROFILE_WINDOW`
+
+`PROFILE_WINDOW` is intended to provide a finer-level of control over
selecting windows for profiles:
+* Specify windows relative to the data timestamp (see the optional `now`
parameter below)
+* Specify non-contiguous windows to better handle seasonal data (e.g. the
last hour for every day for the last month)
+* Specify profile output excluding holidays
+* Specify only profile output on a specific day of the week
+
+It does this by a domain specific language mimicking natural language that
defines the windows excluded.
+
+```
+REQUIRED:
+ windowSelector - The statement specifying the window to select.
+ now - Optional - The timestamp to use for now.
+OPTIONAL:
+ config_overrides - Optional - Map (in curly braces) of name:value
pairs, each overriding the global config parameter
+ of the same name. Default is the empty Map, meaning no
overrides.
+
+e.g. To retrieve all the measurements written for 'profile' and 'entity'
for the last hour
+on the same weekday excluding weekends and US holidays across the last 14
days:
+PROFILE_GET('profile', 'entity', PROFILE_WINDOW('1 hour window every 24
hours starting from 14 days ago including the current day of the week excluding
weekends, holidays:us'))
+```
+
+Note that the `config_overrides` parameter operates exactly as the
`config_overrides` argument in `PROFILE_GET`.
+The only available parameters for override are:
+* `profiler.client.period.duration`
+* `profiler.client.period.duration.units`
+
+##### The Profile Selector Language
+
+The domain specific language can be broken into a series of clauses, some
optional
+* <span style="color:blue">Total Temporal Duration</span> - The total
range of time in which windows may be specified
+* <span style="color:red">Temporal Window Width</span> - How large each
temporal window
+* <span style="color:green">Skip distance</span> (optional)- How far to
skip between when one window starts and when the next begins
+* <span style="color:purple">Inclusion/Exclusion specifiers</span>
(optional) - The set of specifiers to further filter the window
+
+One *must* specify either a total temporal duration or a temporal window
width.
+The remaining clauses are optional.
+During the course of the following discussion, we will color code the
clauses in the examples.
+
+From a high level, the language fits the following three forms:
+
+* <span style="color:red">`time_interval WINDOW?`</span><span
style="color:purple">`(INCLUDING specifier_list)? (EXCLUDING
specifier_list)?`</span>
+* <span style="color:red">`time_interval WINDOW?`</span><span
style="color:green">`EVERY time_interval`</span><span style="color:blue">`FROM
time_interval (TO time_interval)?`</span><span style="color:purple">`(INCLUDING
specifier_list)? (EXCLUDING specifier_list)?`</span>
+* <span style="color:blue">`FROM time_interval (TO time_interval)?`</span>
+
+with
+* `time_interval` representing a time amount followed by a unit (e.g. "1
hour")
+* `specifier_list` representing a comma separated list of inclusion or
exclusion specifiers (e.g. "holidays:us, tuesdays")
+
+
+###### <span style="color:blue">Total Temporal Duration</span>
+
+Total temporal duration is specified by a phrase: `FROM time_interval AGO
TO time_interval AGO`
+This indicates the beginning and ending of a time interval.
+* `FROM` - Can be the words "from" or "starting from"
+* `time_interval` - A time amount followed by a unit (e.g. 1 hour). The
unit may be "minute", "day", "hour" with any pluralization.
+* `TO` - Can be the words "until" or "to"
+* `AGO` - Optionally the word "ago"
+
+The `TO time_interval AGO` portion is optional. If unspecified then it is
expected that the time interval ends now.
+
+Due to the vagaries of the english language, the from and the to portions,
if both specified, are interchangeable
+with regard to which one specifies the start and which specifies the end.
+
+In other words <span style="color:blue">`starting from 1 hour ago to 30
minutes ago`</span> and
+<span style="color:blue">`starting from 30 minutes ago to 1 hour
ago`</span> specify the same
+temporal duration.
+
+**Examples**
+
+* A duration starting 1 hour ago and ending now
+ * <span style="color:blue">`from 1 hour ago`</span>
+ * <span style="color:blue">`from 1 hour`</span>
+ * <span style="color:blue">`starting from 1 hour ago`</span>
+ * <span style="color:blue">`starting from 1 hour`</span>
+* A duration starting 1 hour ago and ending 30 minutes ago:
+ * <span style="color:blue">`from 1 hour ago until 30 minutes ago`</span>
+ * <span style="color:blue">`from 30 minutes ago until 1 hour ago`</span>
+ * <span style="color:blue">`starting from 1 hour ago to 30 minutes
ago`</span>
+ * <span style="color:blue">`starting from 1 hour to 30 minutes`</span>
+
+###### <span style="color:red">Temporal Window Width</span>
+
+Temporal window width is the specification of a window.
+A window is may either repeat within total temporal duration or may fill
the total temporal duration.
+A window is specified by the phrase: `time_interval WINDOW`
+* `time_interval` - A time amount followed by a unit (e.g. 1 hour). The
unit may be "minute", "day", "hour" with any pluralization.
+* `WINDOW` - Optionally the word "window"
+
+**Examples**
+
+* A fixed window starting 2 hours ago and going until now
+ * <span style="color:red">`2 hour`</span>
+ * <span style="color:red">`2 hours`</span>
+ * <span style="color:red">`2 hours window`</span>
+* A repeating 30 minute window starting 2 hours ago and repeating every
hour until now.
+This would result in 2 30-minute wide windows: 2 hours ago and 1 hour ago
+ * <span style="color:red">`30 minute window`</span><span
style="color:green">`every 1 hour`</span><span style="color:blue">`starting
from 2 hours ago`</span>
+ * <span style="color:red">`30 minutes window`</span><span
style="color:green">`every 1 hour`</span><span style="color:blue">`from 2 hours
ago`</span>
+* A repeating 30 minute window starting 2 hours ago and repeating every
hour until 30 minutes ago.
+This would result in 2 30-minute wide windows: 2 hours ago and 1 hour ago
+ * <span style="color:red">`30 minute window`</span><span
style="color:green">`every 1 hour`</span><span style="color:blue">`starting
from 2 hours ago until 30 minutes ago`</span>
+ * <span style="color:red">`30 minutes window`</span><span
style="color:green">`every 1 hour`</span><span style="color:blue">`from 2 hours
ago to 30 minutes ago`</span>
+ * <span style="color:red">`30 minutes window`</span><span
style="color:green">`for every 1 hour`</span><span style="color:blue">`from 30
minutes ago to 2 hours ago`</span>
+
+###### <span style="color:green">Skip distance</span>
+
+Skip distance is the amount of time between temporal window beginnings
that the next window starts.
+It is, in effect, the window period.
+
+It is specified by the phrase `EVERY time_interval`
+* `time_interval` - A time amount followed by a unit (e.g. 1 hour). The
unit may be "minute", "day", "hour" with any pluralization.
+* `EVERY` - The word/phrase "every" or "for every"
+
+**Examples**
+
+* A repeating 30 minute window starting 2 hours ago and repeating every
hour until now.
--- End diff --
Very nice. If there is not an example like this in the README (I probably
missed it) it would be good to add. This is closer to the type of use case
that you were trying to solve initially with this PR.
> Create a DSL-based timestamp lookup for profiler to enable sparse windows
> -------------------------------------------------------------------------
>
> Key: METRON-690
> URL: https://issues.apache.org/jira/browse/METRON-690
> Project: Metron
> Issue Type: New Feature
> Reporter: Casey Stella
>
> I propose that we support the following features:
> * A starting point that is not current time
> * Sparse bins (i.e. the last hour for every tuesday for the last month)
> * The ability to skip events (e.g. weekends, holidays)
> This would result in a new function with the following arguments:
> from - The lookback starting point (default to now)
> fromUnits - The units for the lookback starting point
> to - The ending point for the lookback window (default to from + binSize)
> toUnits - The units for the lookback ending point
> including - A list of conditions which we would skip.
> weekend
> holiday
> sunday through saturday
> excluding - A list of conditions which we would skip.
> weekend
> holiday
> sunday through saturday
> binSize - The size of the lookback bin
> binUnits - The units of the lookback bin
> Given the number of arguments and their complexity and the fact that many,
> many are optional,
> PROFILE_LOOKBACK accept a string backed by a DSL to express these criteria
> Base Case: A lookback of 1 hour ago
> PROFILE_LOOKBACK( '1 hour bins from now')
> Example 1: The same time window every tuesday for the last month starting one
> hour ago
> Just to make this as clear as possible, if this is run at 3PM on Monday
> January 23rd, 2017, it would include the following bins:
> January 17th, 2PM - 3PM
> January 10th, 2PM - 3PM
> January 3rd, 2PM - 3PM
> December 27th, 2PM - 3PM
> PROFILE_LOOKBACK( '1 hour bins from 1 hour to 1 month including tuesdays')
> Example 2: The same time window every sunday for the last month starting one
> hour ago skipping holidays
> Just to make this as clear as possible, if this is run at 3PM on Monday
> January 22rd, 2017, it would include the following bins:
> January 16th, 2PM - 3PM
> January 9th, 2PM - 3PM
> January 2rd, 2PM - 3PM
> NOT December 25th
> PROFILE_LOOKBACK( '1 hour bins from 1 hour to 1 month including tuesdays
> excluding holidays')
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)