GitHub user nickwallen opened a pull request:

    https://github.com/apache/incubator-metron/pull/230

    METRON-392 Allow User to Define Custom 'Group By' for a Profile

    ### [METRON-392](https://issues.apache.org/jira/browse/METRON-392)
    
    Allows a user to optionally define a custom set of 'groupBy' expressions 
that controls how the data is persisted.  This is intended to allow for 
contiguous scans when training on subsets of the data. 
    
    The 'groupBy' expressions can refer to any field within a 
`ProfileMeasurement`.  This includes the following fields: 
      * `profileName`: The name of the profile.
      * `entity`: The name of the entity being profiled.
      * `start`: The window start time in milliseconds from the epoch.
      * `end`: The window end time in milliseconds from the epoch.
      * `value`: The summary value calculated over the window period.
      * `groupBy`: The set of 'groupBy' expressions; not the result of those 
expressions.
    
    A common use case would be grouping the data by day of week.  This would 
allow a contiguous scan to access all profile data for Mondays only.  The 
Stellar expression `DAY_OF_WEEK(start)` would achieve this. 
    
    *NOTE*: A series of date functions will be added to Stellar in a follow-on 
PR to enhance the types of groups that can be created.
    
    ### Example
    ```
    {
      "inputTopic": "indexing",
      "profiles": [
        {
          "profile": "example3",
          "foreach": "ip_src_addr",
          "onlyif": "protocol == 'HTTP'",
          "groupBy": "DAY_OF_WEEK(start)",
          "update": { "s": "STATS_ADD(s, length)" },
          "result": "STATS_MEAN(s)"
        }
      ]
    }
    ```
    
    ### Testing
    To test this change do the following. 
    * Create a profile and do not define a 'groupBy' expression.  Prior to this 
change the row key would include the day of week, week of month, etc which 
altered how the data was sorted on disk.  After this change, these fields will 
not be included in the row key.
    * Create a profile and define a 'groupBy' expression.  The result of this 
expression will be embedded in the row key.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/incubator-metron METRON-392

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #230
    
----
commit f256fe7461a0a8273b0a9f9e2d10a01c7c53c473
Author: Nick Allen <[email protected]>
Date:   2016-08-23T17:03:15Z

    METRON-372 Enhance Statistical Operations Available for Use with the 
Profiler

commit fc38cb8faf8970d6c8563e43cdf48158cc03cbda
Author: Nick Allen <[email protected]>
Date:   2016-08-23T17:18:26Z

    METRON-377 Enable Profiles that Use Non-Single Pass Summary Functions

commit 9ee905ea7b03d13ac512a557a270d12be332a4b8
Author: Nick Allen <[email protected]>
Date:   2016-08-22T14:52:34Z

    METRON-392 Allow User to Define Custom 'Group By' for a Profile

commit bfc01e17820894541803255a617a4b7a7804e04e
Author: Nick Allen <[email protected]>
Date:   2016-08-25T11:47:49Z

    METRON-392 Merged with master

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to