[
https://issues.apache.org/jira/browse/METRON-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876915#comment-15876915
]
ASF GitHub Bot commented on METRON-690:
---------------------------------------
Github user nickwallen commented on a diff in the pull request:
https://github.com/apache/incubator-metron/pull/450#discussion_r102338396
--- Diff: metron-analytics/metron-profiler-client/README.md ---
@@ -91,37 +60,268 @@ want to change the global Client configuration so as
not to disrupt the work of
| profiler.client.salt.divisor | The salt divisor used to store
profile data.
| Optional | 1000 |
| hbase.provider.impl | The name of the
HBaseTableProvider implementation class.
| Optional | |
+
+### Profile Selectors
+
+You will notice that the third argument for `PROFILE_GET` is a list of
`ProfilePeriod` objects. This list is expected to
+be produced by another Stellar function. There are a couple options
available.
+
+#### `PROFILE_FIXED`
+
+The profiler periods associated with a fixed lookback starting from now.
These are ProfilePeriod objects.
+```
+REQUIRED:
+ durationAgo - How long ago should values be retrieved from?
+ units - The units of 'durationAgo'.
+OPTIONAL:
+ config_overrides - Optional - Map (in curly braces) of name:value
pairs, each overriding the global config parameter
+ of the same name. Default is the empty Map, meaning no
overrides.
+
+e.g. To retrieve all the profiles for the last 5 hours.
PROFILE_GET('profile', 'entity', PROFILE_FIXED(5, 'HOURS'))
+```
+
+Note that the `config_overrides` parameter operates exactly as the
`config_overrides` argument in `PROFILE_GET`.
+The only available parameters for override are:
+* `profiler.client.period.duration`
+* `profiler.client.period.duration.units`
+
+#### `PROFILE_WINDOW`
+
+`PROFILE_WINDOW` is intended to provide a finer-level of control over
selecting windows for profiles:
+* Specify windows relative to the data timestamp (see the optional `now`
parameter below)
+* Specify non-contiguous windows to better handle seasonal data (e.g. the
last hour for every day for the last month)
+* Specify profile output excluding holidays
+* Specify only profile output on a specific day of the week
+
+It does this by a domain specific language mimicking natural language that
defines the windows excluded.
+
+```
+REQUIRED:
+ windowSelector - The statement specifying the window to select.
+ now - Optional - The timestamp to use for now.
+OPTIONAL:
+ config_overrides - Optional - Map (in curly braces) of name:value
pairs, each overriding the global config parameter
+ of the same name. Default is the empty Map, meaning no
overrides.
+
+e.g. To retrieve all the measurements written for 'profile' and 'entity'
for the last hour
+on the same weekday excluding weekends and US holidays across the last 14
days:
+PROFILE_GET('profile', 'entity', PROFILE_WINDOW('1 hour window every 24
hours starting from 14 days ago including the current day of the week excluding
weekends, holidays:us'))
+```
+
+Note that the `config_overrides` parameter operates exactly as the
`config_overrides` argument in `PROFILE_GET`.
+The only available parameters for override are:
+* `profiler.client.period.duration`
+* `profiler.client.period.duration.units`
+
+##### The Profile Selector Language
+
+The domain specific language can be broken into a series of clauses, some
optional
+* <span style="color:blue">Total Temporal Duration</span> - The total
range of time in which windows may be specified
+* <span style="color:red">Temporal Window Width</span> - How large each
temporal window
+* <span style="color:green">Skip distance</span> (optional)- How far to
skip between when one window starts and when the next begins
+* <span style="color:purple">Inclusion/Exclusion specifiers</span>
(optional) - The set of specifiers to further filter the window
+
+One *must* specify either a total temporal duration or a temporal window
width.
+The remaining clauses are optional.
+During the course of the following discussion, we will color code the
clauses in the examples.
+
+From a high level, the language fits the following three forms:
+
+* <span style="color:red">`time_interval WINDOW?`</span><span
style="color:purple">`(INCLUDING specifier_list)? (EXCLUDING
specifier_list)?`</span>
+* <span style="color:red">`time_interval WINDOW?`</span><span
style="color:green">`EVERY time_interval`</span><span style="color:blue">`FROM
time_interval (TO time_interval)?`</span><span style="color:purple">`(INCLUDING
specifier_list)? (EXCLUDING specifier_list)?`</span>
+* <span style="color:blue">`FROM time_interval (TO time_interval)?`</span>
+
+with
+* `time_interval` representing a time amount followed by a unit (e.g. "1
hour")
+* `specifier_list` representing a comma separated list of inclusion or
exclusion specifiers (e.g. "holidays:us, tuesdays")
+
+
+###### <span style="color:blue">Total Temporal Duration</span>
+
+Total temporal duration is specified by a phrase: `FROM time_interval AGO
TO time_interval AGO`
+This indicates the beginning and ending of a time interval.
+* `FROM` - Can be the words "from" or "starting from"
+* `time_interval` - A time amount followed by a unit (e.g. 1 hour). The
unit may be "minute", "day", "hour" with any pluralization.
+* `TO` - Can be the words "until" or "to"
+* `AGO` - Optionally the word "ago"
+
+The `TO time_interval AGO` portion is optional. If unspecified then it is
expected that the time interval ends now.
--- End diff --
One thing to consider is that we seem to be assuming "processing time" with
the grammar. For example, in the testing notes that you provided, we are
enriching the message with an expression like this.
```
STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('5
minute window every 10 minutes starting from 2 minutes ago until 32 minutes ago
excluding holidays:us'))))
```
When we grab the profile from "2 minutes ago" this will be 2 minutes ago
from processing time (aka system time) rather than the event time of the
message.
Is there any way to support event time now? I'm not sure that this is
needed right now, but thought we should have a discussion around this at the
very least.
> Create a DSL-based timestamp lookup for profiler to enable sparse windows
> -------------------------------------------------------------------------
>
> Key: METRON-690
> URL: https://issues.apache.org/jira/browse/METRON-690
> Project: Metron
> Issue Type: New Feature
> Reporter: Casey Stella
>
> I propose that we support the following features:
> * A starting point that is not current time
> * Sparse bins (i.e. the last hour for every tuesday for the last month)
> * The ability to skip events (e.g. weekends, holidays)
> This would result in a new function with the following arguments:
> from - The lookback starting point (default to now)
> fromUnits - The units for the lookback starting point
> to - The ending point for the lookback window (default to from + binSize)
> toUnits - The units for the lookback ending point
> including - A list of conditions which we would skip.
> weekend
> holiday
> sunday through saturday
> excluding - A list of conditions which we would skip.
> weekend
> holiday
> sunday through saturday
> binSize - The size of the lookback bin
> binUnits - The units of the lookback bin
> Given the number of arguments and their complexity and the fact that many,
> many are optional,
> PROFILE_LOOKBACK accept a string backed by a DSL to express these criteria
> Base Case: A lookback of 1 hour ago
> PROFILE_LOOKBACK( '1 hour bins from now')
> Example 1: The same time window every tuesday for the last month starting one
> hour ago
> Just to make this as clear as possible, if this is run at 3PM on Monday
> January 23rd, 2017, it would include the following bins:
> January 17th, 2PM - 3PM
> January 10th, 2PM - 3PM
> January 3rd, 2PM - 3PM
> December 27th, 2PM - 3PM
> PROFILE_LOOKBACK( '1 hour bins from 1 hour to 1 month including tuesdays')
> Example 2: The same time window every sunday for the last month starting one
> hour ago skipping holidays
> Just to make this as clear as possible, if this is run at 3PM on Monday
> January 22rd, 2017, it would include the following bins:
> January 16th, 2PM - 3PM
> January 9th, 2PM - 3PM
> January 2rd, 2PM - 3PM
> NOT December 25th
> PROFILE_LOOKBACK( '1 hour bins from 1 hour to 1 month including tuesdays
> excluding holidays')
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)