GitHub user mmiklavc reopened a pull request:

    https://github.com/apache/incubator-metron/pull/397

    METRON-627: Add HyperLogLogPlus implementation to Stellar

    This PR addresses https://issues.apache.org/jira/browse/METRON-627
    
    Leverages the HLLP implementation from 
https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java
    
    4 new Stellar functions have been added that allow a user to initialize a 
cardinality estimator, add items, merge estimators, and calculate cardinality 
estimates.
    
    ### `HLLP_CARDINALITY`
      * Description: Returns HyperLogLogPlus-estimated cardinality for this set
      * Input:
        * hyperLogLogPlus - the hllp set
      * Returns: Long value representing the cardinality for this set
    
    ### `HLLP_INIT`
      * Description: Initializes the set
      * Input:
        * p (required) - the precision value for the normal set
        * sp - the precision value for the sparse set. If sp is not specified 
the sparse set will be disabled.
      * Returns: A new HyperLogLogPlus set
    
    ### `HLLP_MERGE`
      * Description: Merge hllp sets together
      * Input:
        * hllp1 - first hllp set
        * hllp2 - second hllp set
        * hllpn - additional sets to merge
      * Returns: A new merged HyperLogLogPlus estimator set
    
    ### `HLLP_OFFER`
      * Description: Add value to the set
      * Input:
        * hyperLogLogPlus - the hllp set
        * o - Object to add to the set
      * Returns: The HyperLogLogPlus set with a new object added
    
    **Note:** Added new library to metron-common pom and added 3 new items to 
dependencies_with_url.csv.
    
    **Testing**
    
    Spun up the Stellar REPL in quick-dev. And verified that the function 
composition is working as expected and returning correct cardinality estimates 
for simple sparse set cases. For example:
    ```
    [Stellar]>>> HLLP_CARDINALITY(HLLP_MERGE( 
HLLP_OFFER(HLLP_OFFER(HLLP_INIT(5, 6), "runnings"), "cool"), 
HLLP_OFFER(HLLP_OFFER(HLLP_INIT(5, 6), "bobsled"), "team")))
    4
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mmiklavc/incubator-metron hyperloglog

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #397
    
----
commit afce30539f6996a607e85d3fd35aac5fcb5c19aa
Author: Michael Miklavcic <michael.miklav...@gmail.com>
Date:   2016-12-15T20:55:39Z

    METRON-627: Add HyperLogLogPlus implementation to Stellar

commit 414a3a98976b98a253ab9921720f02c8a7431da2
Author: Michael Miklavcic <michael.miklav...@gmail.com>
Date:   2017-01-09T17:00:08Z

    work in progress commit

commit c7f57a4acbb0ef357c1af9eaa263afea7bc83d9a
Author: Michael Miklavcic <michael.miklav...@gmail.com>
Date:   2017-01-11T16:58:58Z

    Merge with master

commit 90d9659f415404c6c4682289c7bde669c352f517
Author: Michael Miklavcic <michael.miklav...@gmail.com>
Date:   2017-01-12T20:33:10Z

    Refactor, fix statistics output

commit 261e69651d4ae0b99e88e0e4a2c4e7568aa23fcb
Author: Michael Miklavcic <michael.miklav...@gmail.com>
Date:   2017-01-12T23:17:13Z

    METRON-627: Updated with sensible default precision values

commit 9078094dd720d89f64ecf45506ab0c5077aa58a7
Author: Michael Miklavcic <michael.miklav...@gmail.com>
Date:   2017-01-13T19:17:37Z

    METRON-627: Add default init for HLLP_ADD(null, 'val')

commit 9e1ff937fe51841ac2fa3235bf87964cba8a1ae8
Author: Michael Miklavcic <michael.miklav...@gmail.com>
Date:   2017-01-17T20:09:26Z

    Merge branch 'master' into hyperloglog

commit d392f044e330fe273cb0f0b4ff820b4ef1a3595d
Author: Michael Miklavcic <michael.miklav...@gmail.com>
Date:   2017-01-17T20:33:11Z

    METRON-627: Fix Stellar lexer to handle newline at end

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to