[
https://issues.apache.org/jira/browse/METRON-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877085#comment-15877085
]
ASF GitHub Bot commented on METRON-690:
---------------------------------------
Github user cestella commented on a diff in the pull request:
https://github.com/apache/incubator-metron/pull/450#discussion_r102355810
--- Diff: metron-analytics/metron-profiler-client/README.md ---
@@ -91,37 +60,268 @@ want to change the global Client configuration so as
not to disrupt the work of
| profiler.client.salt.divisor | The salt divisor used to store
profile data.
| Optional | 1000 |
| hbase.provider.impl | The name of the
HBaseTableProvider implementation class.
| Optional | |
+
+### Profile Selectors
+
+You will notice that the third argument for `PROFILE_GET` is a list of
`ProfilePeriod` objects. This list is expected to
+be produced by another Stellar function. There are a couple options
available.
+
+#### `PROFILE_FIXED`
+
+The profiler periods associated with a fixed lookback starting from now.
These are ProfilePeriod objects.
+```
+REQUIRED:
+ durationAgo - How long ago should values be retrieved from?
+ units - The units of 'durationAgo'.
+OPTIONAL:
+ config_overrides - Optional - Map (in curly braces) of name:value
pairs, each overriding the global config parameter
+ of the same name. Default is the empty Map, meaning no
overrides.
+
+e.g. To retrieve all the profiles for the last 5 hours.
PROFILE_GET('profile', 'entity', PROFILE_FIXED(5, 'HOURS'))
+```
+
+Note that the `config_overrides` parameter operates exactly as the
`config_overrides` argument in `PROFILE_GET`.
+The only available parameters for override are:
+* `profiler.client.period.duration`
+* `profiler.client.period.duration.units`
+
+#### `PROFILE_WINDOW`
+
+`PROFILE_WINDOW` is intended to provide a finer-level of control over
selecting windows for profiles:
+* Specify windows relative to the data timestamp (see the optional `now`
parameter below)
+* Specify non-contiguous windows to better handle seasonal data (e.g. the
last hour for every day for the last month)
+* Specify profile output excluding holidays
+* Specify only profile output on a specific day of the week
+
+It does this by a domain specific language mimicking natural language that
defines the windows excluded.
+
+```
+REQUIRED:
+ windowSelector - The statement specifying the window to select.
+ now - Optional - The timestamp to use for now.
+OPTIONAL:
+ config_overrides - Optional - Map (in curly braces) of name:value
pairs, each overriding the global config parameter
+ of the same name. Default is the empty Map, meaning no
overrides.
+
+e.g. To retrieve all the measurements written for 'profile' and 'entity'
for the last hour
+on the same weekday excluding weekends and US holidays across the last 14
days:
+PROFILE_GET('profile', 'entity', PROFILE_WINDOW('1 hour window every 24
hours starting from 14 days ago including the current day of the week excluding
weekends, holidays:us'))
+```
+
+Note that the `config_overrides` parameter operates exactly as the
`config_overrides` argument in `PROFILE_GET`.
+The only available parameters for override are:
+* `profiler.client.period.duration`
+* `profiler.client.period.duration.units`
+
+##### The Profile Selector Language
+
+The domain specific language can be broken into a series of clauses, some
optional
+* <span style="color:blue">Total Temporal Duration</span> - The total
range of time in which windows may be specified
+* <span style="color:red">Temporal Window Width</span> - How large each
temporal window
+* <span style="color:green">Skip distance</span> (optional)- How far to
skip between when one window starts and when the next begins
+* <span style="color:purple">Inclusion/Exclusion specifiers</span>
(optional) - The set of specifiers to further filter the window
+
+One *must* specify either a total temporal duration or a temporal window
width.
+The remaining clauses are optional.
+During the course of the following discussion, we will color code the
clauses in the examples.
+
+From a high level, the language fits the following three forms:
+
+* <span style="color:red">`time_interval WINDOW?`</span><span
style="color:purple">`(INCLUDING specifier_list)? (EXCLUDING
specifier_list)?`</span>
+* <span style="color:red">`time_interval WINDOW?`</span><span
style="color:green">`EVERY time_interval`</span><span style="color:blue">`FROM
time_interval (TO time_interval)?`</span><span style="color:purple">`(INCLUDING
specifier_list)? (EXCLUDING specifier_list)?`</span>
+* <span style="color:blue">`FROM time_interval (TO time_interval)?`</span>
+
+with
+* `time_interval` representing a time amount followed by a unit (e.g. "1
hour")
+* `specifier_list` representing a comma separated list of inclusion or
exclusion specifiers (e.g. "holidays:us, tuesdays")
+
+
+###### <span style="color:blue">Total Temporal Duration</span>
+
+Total temporal duration is specified by a phrase: `FROM time_interval AGO
TO time_interval AGO`
+This indicates the beginning and ending of a time interval.
+* `FROM` - Can be the words "from" or "starting from"
+* `time_interval` - A time amount followed by a unit (e.g. 1 hour). The
unit may be "minute", "day", "hour" with any pluralization.
+* `TO` - Can be the words "until" or "to"
+* `AGO` - Optionally the word "ago"
+
+The `TO time_interval AGO` portion is optional. If unspecified then it is
expected that the time interval ends now.
+
+Due to the vagaries of the english language, the from and the to portions,
if both specified, are interchangeable
+with regard to which one specifies the start and which specifies the end.
+
+In other words <span style="color:blue">`starting from 1 hour ago to 30
minutes ago`</span> and
+<span style="color:blue">`starting from 30 minutes ago to 1 hour
ago`</span> specify the same
+temporal duration.
+
+**Examples**
+
+* A duration starting 1 hour ago and ending now
+ * <span style="color:blue">`from 1 hour ago`</span>
+ * <span style="color:blue">`from 1 hour`</span>
+ * <span style="color:blue">`starting from 1 hour ago`</span>
+ * <span style="color:blue">`starting from 1 hour`</span>
+* A duration starting 1 hour ago and ending 30 minutes ago:
+ * <span style="color:blue">`from 1 hour ago until 30 minutes ago`</span>
+ * <span style="color:blue">`from 30 minutes ago until 1 hour ago`</span>
+ * <span style="color:blue">`starting from 1 hour ago to 30 minutes
ago`</span>
+ * <span style="color:blue">`starting from 1 hour to 30 minutes`</span>
+
+###### <span style="color:red">Temporal Window Width</span>
+
+Temporal window width is the specification of a window.
+A window is may either repeat within total temporal duration or may fill
the total temporal duration.
+A window is specified by the phrase: `time_interval WINDOW`
+* `time_interval` - A time amount followed by a unit (e.g. 1 hour). The
unit may be "minute", "day", "hour" with any pluralization.
+* `WINDOW` - Optionally the word "window"
+
+**Examples**
+
+* A fixed window starting 2 hours ago and going until now
+ * <span style="color:red">`2 hour`</span>
+ * <span style="color:red">`2 hours`</span>
+ * <span style="color:red">`2 hours window`</span>
+* A repeating 30 minute window starting 2 hours ago and repeating every
hour until now.
+This would result in 2 30-minute wide windows: 2 hours ago and 1 hour ago
+ * <span style="color:red">`30 minute window`</span><span
style="color:green">`every 1 hour`</span><span style="color:blue">`starting
from 2 hours ago`</span>
+ * <span style="color:red">`30 minutes window`</span><span
style="color:green">`every 1 hour`</span><span style="color:blue">`from 2 hours
ago`</span>
+* A repeating 30 minute window starting 2 hours ago and repeating every
hour until 30 minutes ago.
+This would result in 2 30-minute wide windows: 2 hours ago and 1 hour ago
+ * <span style="color:red">`30 minute window`</span><span
style="color:green">`every 1 hour`</span><span style="color:blue">`starting
from 2 hours ago until 30 minutes ago`</span>
+ * <span style="color:red">`30 minutes window`</span><span
style="color:green">`every 1 hour`</span><span style="color:blue">`from 2 hours
ago to 30 minutes ago`</span>
+ * <span style="color:red">`30 minutes window`</span><span
style="color:green">`for every 1 hour`</span><span style="color:blue">`from 30
minutes ago to 2 hours ago`</span>
+
+###### <span style="color:green">Skip distance</span>
+
+Skip distance is the amount of time between temporal window beginnings
that the next window starts.
+It is, in effect, the window period.
+
+It is specified by the phrase `EVERY time_interval`
+* `time_interval` - A time amount followed by a unit (e.g. 1 hour). The
unit may be "minute", "day", "hour" with any pluralization.
+* `EVERY` - The word/phrase "every" or "for every"
+
+**Examples**
+
+* A repeating 30 minute window starting 2 hours ago and repeating every
hour until now.
--- End diff --
good feedback, I'll add that.
> Create a DSL-based timestamp lookup for profiler to enable sparse windows
> -------------------------------------------------------------------------
>
> Key: METRON-690
> URL: https://issues.apache.org/jira/browse/METRON-690
> Project: Metron
> Issue Type: New Feature
> Reporter: Casey Stella
>
> I propose that we support the following features:
> * A starting point that is not current time
> * Sparse bins (i.e. the last hour for every tuesday for the last month)
> * The ability to skip events (e.g. weekends, holidays)
> This would result in a new function with the following arguments:
> from - The lookback starting point (default to now)
> fromUnits - The units for the lookback starting point
> to - The ending point for the lookback window (default to from + binSize)
> toUnits - The units for the lookback ending point
> including - A list of conditions which we would skip.
> weekend
> holiday
> sunday through saturday
> excluding - A list of conditions which we would skip.
> weekend
> holiday
> sunday through saturday
> binSize - The size of the lookback bin
> binUnits - The units of the lookback bin
> Given the number of arguments and their complexity and the fact that many,
> many are optional,
> PROFILE_LOOKBACK accept a string backed by a DSL to express these criteria
> Base Case: A lookback of 1 hour ago
> PROFILE_LOOKBACK( '1 hour bins from now')
> Example 1: The same time window every tuesday for the last month starting one
> hour ago
> Just to make this as clear as possible, if this is run at 3PM on Monday
> January 23rd, 2017, it would include the following bins:
> January 17th, 2PM - 3PM
> January 10th, 2PM - 3PM
> January 3rd, 2PM - 3PM
> December 27th, 2PM - 3PM
> PROFILE_LOOKBACK( '1 hour bins from 1 hour to 1 month including tuesdays')
> Example 2: The same time window every sunday for the last month starting one
> hour ago skipping holidays
> Just to make this as clear as possible, if this is run at 3PM on Monday
> January 22rd, 2017, it would include the following bins:
> January 16th, 2PM - 3PM
> January 9th, 2PM - 3PM
> January 2rd, 2PM - 3PM
> NOT December 25th
> PROFILE_LOOKBACK( '1 hour bins from 1 hour to 1 month including tuesdays
> excluding holidays')
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)