GitHub user nickwallen opened a pull request:
https://github.com/apache/incubator-metron/pull/215
METRON-377 Enable Profiles that Use Non-Single Pass Summary Functions
Note: This change depends on #208, #212, #213, #214 . The diff will be
easier to grok once those PRs are merged.
### [METRON-377](https://issues.apache.org/jira/browse/METRON-377)
As of METRON-372 and #214 , Profiles can be built using many statistical
summaries that only require a single-pass over the data. This is less memory
intensive and more scalable for high volume loads.
Unfortunately, not all functions can be calculated in a single pass. In
particular, the skewness, ketosis and percentiles require all data to be stored
in memory for the calculation to occur. The platform was enhanced so that a
user can leverage skewness, ketosis and percentiles.
### Changes
The `STATS_INIT` function was enhanced to accept a `window_size`. This
defines the number of input data elements that are maintained in memory.
If the `window_size` is greater than 0, a rolling window of the most recent
`window_size` elements is maintained in memory. The skewness, ketosis and
percentiles are calculated over this rolling window. The `window_size` must be
>0 otherwise these values cannot be calculated.
If a user supplies a `window_size` equal to 0 then the more efficient
implementation that does not maintain a rolling window in memory is used. Of
course, in this case, the skewness, ketosis, and percentiles cannot be
calculated.
The following additional functions were also added.
* STATS_KETOSIS
* STATS_SKEWNESS
* STATS_PERCENTILES
Another integration test was added to validate this end-to-end also.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nickwallen/incubator-metron METRON-377
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-metron/pull/215.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #215
----
commit 11ef29bb73c8363b1c905f597be496929b77888f
Author: Nick Allen <[email protected]>
Date: 2016-07-29T20:53:49Z
METRON-309 Create a normalcy profiler
commit 28560b36273271c438c3309df610a4c4306ffdde
Author: Nick Allen <[email protected]>
Date: 2016-08-11T20:46:00Z
METRON-309 Corrected a typo in the 'Getting Started' instructions
commit d55d32efeea38f52bd9d1095aa6f8817bc15f0a5
Author: Nick Allen <[email protected]>
Date: 2016-08-12T13:16:19Z
METRON-309 Altered based on Stellar Unification changes
commit ca81339e0f326dc04110eafd74fd0879dfe1a029
Author: Nick Allen <[email protected]>
Date: 2016-08-12T13:23:06Z
METRON-309 Need to set the kafka broker in the Profiler topology properties
commit 712a1083c21eb8f7c0d81511d432c054792ebe9b
Author: Nick Allen <[email protected]>
Date: 2016-08-12T15:17:47Z
METRON-309 Updated examples to use latest Stellar binary functions
commit 01aa1f972324f97b231b7d9a26f122891a1b6d50
Author: Nick Allen <[email protected]>
Date: 2016-08-15T18:03:49Z
METRON-309 Fixed the README examples and added each as an integration test.
commit d9e2c5e568292c4a08c6c6c314d3c2e8a9ea38be
Author: Nick Allen <[email protected]>
Date: 2016-08-15T20:08:26Z
METRON-367 Enhance Profiler to Support Multiple Numeric Types
commit b9366ff97dbc8968ac44a2ba8dfe7bca43e7adc6
Author: Nick Allen <[email protected]>
Date: 2016-08-15T20:32:51Z
METRON-368 Simplify Profile Configuration with Sensible Defaults
commit 486e8138efbc64342b0bcf405adae521c55d4e33
Author: Nick Allen <[email protected]>
Date: 2016-08-16T16:51:07Z
METRON-309 Removed legacy classes from Stellar Unification that are no
longer needed
commit c3ce806606c0903fd62819e4a29ea91adb1979be
Author: Nick Allen <[email protected]>
Date: 2016-08-16T16:34:26Z
METRON-372 Refactored the Stellar functions for clarity
commit eaf5255143d7c09ee8735d3c7d4fa34a58c25831
Author: Nick Allen <[email protected]>
Date: 2016-08-16T19:49:13Z
METRON-372 Added summary statistics functions to Stellar
commit 2e4844cf7e4a98c3594fac542bcf0f32d938b0b2
Author: Nick Allen <[email protected]>
Date: 2016-08-16T20:01:13Z
METRON-372 Updated example to show use of new STATS_x functions
commit 8025994f5858a2e332f4a2ee325ab9cffd82b092
Author: Nick Allen <[email protected]>
Date: 2016-08-17T19:39:53Z
METRON-377 Enable Profiles that Use Non-Single Pass Summary Functions
commit 62a3a0f48d5c792c85486014cfa45e2c4620362c
Author: Nick Allen <[email protected]>
Date: 2016-08-17T20:18:24Z
METRON-377 Added another integration test
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---