GitHub user nickwallen opened a pull request:
https://github.com/apache/incubator-metron/pull/230
METRON-392 Allow User to Define Custom 'Group By' for a Profile
### [METRON-392](https://issues.apache.org/jira/browse/METRON-392)
Allows a user to optionally define a custom set of 'groupBy' expressions
that controls how the data is persisted. This is intended to allow for
contiguous scans when training on subsets of the data.
The 'groupBy' expressions can refer to any field within a
`ProfileMeasurement`. This includes the following fields:
* `profileName`: The name of the profile.
* `entity`: The name of the entity being profiled.
* `start`: The window start time in milliseconds from the epoch.
* `end`: The window end time in milliseconds from the epoch.
* `value`: The summary value calculated over the window period.
* `groupBy`: The set of 'groupBy' expressions; not the result of those
expressions.
A common use case would be grouping the data by day of week. This would
allow a contiguous scan to access all profile data for Mondays only. The
Stellar expression `DAY_OF_WEEK(start)` would achieve this.
*NOTE*: A series of date functions will be added to Stellar in a follow-on
PR to enhance the types of groups that can be created.
### Example
```
{
"inputTopic": "indexing",
"profiles": [
{
"profile": "example3",
"foreach": "ip_src_addr",
"onlyif": "protocol == 'HTTP'",
"groupBy": "DAY_OF_WEEK(start)",
"update": { "s": "STATS_ADD(s, length)" },
"result": "STATS_MEAN(s)"
}
]
}
```
### Testing
To test this change do the following.
* Create a profile and do not define a 'groupBy' expression. Prior to this
change the row key would include the day of week, week of month, etc which
altered how the data was sorted on disk. After this change, these fields will
not be included in the row key.
* Create a profile and define a 'groupBy' expression. The result of this
expression will be embedded in the row key.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nickwallen/incubator-metron METRON-392
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-metron/pull/230.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #230
----
commit f256fe7461a0a8273b0a9f9e2d10a01c7c53c473
Author: Nick Allen <[email protected]>
Date: 2016-08-23T17:03:15Z
METRON-372 Enhance Statistical Operations Available for Use with the
Profiler
commit fc38cb8faf8970d6c8563e43cdf48158cc03cbda
Author: Nick Allen <[email protected]>
Date: 2016-08-23T17:18:26Z
METRON-377 Enable Profiles that Use Non-Single Pass Summary Functions
commit 9ee905ea7b03d13ac512a557a270d12be332a4b8
Author: Nick Allen <[email protected]>
Date: 2016-08-22T14:52:34Z
METRON-392 Allow User to Define Custom 'Group By' for a Profile
commit bfc01e17820894541803255a617a4b7a7804e04e
Author: Nick Allen <[email protected]>
Date: 2016-08-25T11:47:49Z
METRON-392 Merged with master
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---