GitHub user cestella opened a pull request:
https://github.com/apache/incubator-metron/pull/250
METRON-416: Provide the ability to store mergeable data structures for
summarizing data on-line
With the currently worked on profiler, we it will be advantageous for us to
be able to store more than just numeric data as a profile. In particular,
adding some common data structures that can be merged will allow us to store
data at a fixed tick-rate, but merge the results from hbase across multiple
ticks.
The following structures should be supported:
* An implementation of the `STATS` stellar functions which is backed by an
online and mergeable class using distributional sketches for querying
distribution.
* An implementation of bloom filters so that probabalistic existence
queries can be made
This should facilitate simple statistical outlier analysis.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cestella/incubator-metron
probabalistic_functions
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-metron/pull/250.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #250
----
commit 1fa3950097668602f2f6e98f523b37c995139ea9
Author: cstella <[email protected]>
Date: 2016-09-12T22:21:27Z
Refactoring to allow online statistical calculations
commit ce8e175979b9f7df023b47534b7b29970d737e1f
Author: cstella <[email protected]>
Date: 2016-09-13T18:03:50Z
fixed kurtosis and skewness to be unbiased.
commit 17615e96f488710a7c1dc2269ab0cea8356a744b
Author: cstella <[email protected]>
Date: 2016-09-13T18:12:00Z
Licensing.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---