[ 
https://issues.apache.org/jira/browse/METRON-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828577#comment-15828577
 ] 

ASF GitHub Bot commented on METRON-627:
---------------------------------------

GitHub user mmiklavc reopened a pull request:

    https://github.com/apache/incubator-metron/pull/397

    METRON-627: Add HyperLogLogPlus implementation to Stellar

    This PR addresses https://issues.apache.org/jira/browse/METRON-627
    
    Leverages the HLLP implementation from 
https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java
    
    4 new Stellar functions have been added that allow a user to initialize a 
cardinality estimator, add items, merge estimators, and calculate cardinality 
estimates.
    
    ### `HLLP_CARDINALITY`
      * Description: Returns HyperLogLogPlus-estimated cardinality for this set
      * Input:
        * hyperLogLogPlus - the hllp set
      * Returns: Long value representing the cardinality for this set
    
    ### `HLLP_INIT`
      * Description: Initializes the set
      * Input:
        * p (required) - the precision value for the normal set
        * sp - the precision value for the sparse set. If sp is not specified 
the sparse set will be disabled.
      * Returns: A new HyperLogLogPlus set
    
    ### `HLLP_MERGE`
      * Description: Merge hllp sets together
      * Input:
        * hllp1 - first hllp set
        * hllp2 - second hllp set
        * hllpn - additional sets to merge
      * Returns: A new merged HyperLogLogPlus estimator set
    
    ### `HLLP_OFFER`
      * Description: Add value to the set
      * Input:
        * hyperLogLogPlus - the hllp set
        * o - Object to add to the set
      * Returns: The HyperLogLogPlus set with a new object added
    
    **Note:** Added new library to metron-common pom and added 3 new items to 
dependencies_with_url.csv.
    
    **Testing**
    
    Spun up the Stellar REPL in quick-dev. And verified that the function 
composition is working as expected and returning correct cardinality estimates 
for simple sparse set cases. For example:
    ```
    [Stellar]>>> HLLP_CARDINALITY(HLLP_MERGE( 
HLLP_OFFER(HLLP_OFFER(HLLP_INIT(5, 6), "runnings"), "cool"), 
HLLP_OFFER(HLLP_OFFER(HLLP_INIT(5, 6), "bobsled"), "team")))
    4
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mmiklavc/incubator-metron hyperloglog

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #397
    
----
commit afce30539f6996a607e85d3fd35aac5fcb5c19aa
Author: Michael Miklavcic <[email protected]>
Date:   2016-12-15T20:55:39Z

    METRON-627: Add HyperLogLogPlus implementation to Stellar

commit 414a3a98976b98a253ab9921720f02c8a7431da2
Author: Michael Miklavcic <[email protected]>
Date:   2017-01-09T17:00:08Z

    work in progress commit

commit c7f57a4acbb0ef357c1af9eaa263afea7bc83d9a
Author: Michael Miklavcic <[email protected]>
Date:   2017-01-11T16:58:58Z

    Merge with master

commit 90d9659f415404c6c4682289c7bde669c352f517
Author: Michael Miklavcic <[email protected]>
Date:   2017-01-12T20:33:10Z

    Refactor, fix statistics output

commit 261e69651d4ae0b99e88e0e4a2c4e7568aa23fcb
Author: Michael Miklavcic <[email protected]>
Date:   2017-01-12T23:17:13Z

    METRON-627: Updated with sensible default precision values

commit 9078094dd720d89f64ecf45506ab0c5077aa58a7
Author: Michael Miklavcic <[email protected]>
Date:   2017-01-13T19:17:37Z

    METRON-627: Add default init for HLLP_ADD(null, 'val')

commit 9e1ff937fe51841ac2fa3235bf87964cba8a1ae8
Author: Michael Miklavcic <[email protected]>
Date:   2017-01-17T20:09:26Z

    Merge branch 'master' into hyperloglog

commit d392f044e330fe273cb0f0b4ff820b4ef1a3595d
Author: Michael Miklavcic <[email protected]>
Date:   2017-01-17T20:33:11Z

    METRON-627: Fix Stellar lexer to handle newline at end

----


> Add HyperLogLogPlus implementation to Stellar
> ---------------------------------------------
>
>                 Key: METRON-627
>                 URL: https://issues.apache.org/jira/browse/METRON-627
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Michael Miklavcic
>            Assignee: Michael Miklavcic
>
> Calculating set cardinality can be a useful tool for a security analyst. For 
> instance, a large volume of non-unique src ip addresses hitting your network 
> may be an indication that you are currently under attack. There have been 
> many advancements in distinct value (DV) estimation over the years. We have 
> seen implementations evolve from K-Minimum-Values (KMV), to LogLog, to 
> HyperLogLog, and now to Google's much-improved HyperLogLogPlu algorithm. The 
> key improvements in this latest manifestation of the algorithm are:
> moves to a 64-bit hash
> handles sparse sets
> is more accurate with small cardinality
> This Jira tracks the effort to add a HyperLogLogPlus implementation to Metron.
> References:
> https://research.neustar.biz/2013/01/24/hyperloglog-googles-take-on-engineering-hll/
> http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40671.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to