GitHub user mmiklavc reopened a pull request: https://github.com/apache/incubator-metron/pull/397
METRON-627: Add HyperLogLogPlus implementation to Stellar This PR addresses https://issues.apache.org/jira/browse/METRON-627 Leverages the HLLP implementation from https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java 4 new Stellar functions have been added that allow a user to initialize a cardinality estimator, add items, merge estimators, and calculate cardinality estimates. ### `HLLP_CARDINALITY` * Description: Returns HyperLogLogPlus-estimated cardinality for this set * Input: * hyperLogLogPlus - the hllp set * Returns: Long value representing the cardinality for this set ### `HLLP_INIT` * Description: Initializes the set * Input: * p (required) - the precision value for the normal set * sp - the precision value for the sparse set. If sp is not specified the sparse set will be disabled. * Returns: A new HyperLogLogPlus set ### `HLLP_MERGE` * Description: Merge hllp sets together * Input: * hllp1 - first hllp set * hllp2 - second hllp set * hllpn - additional sets to merge * Returns: A new merged HyperLogLogPlus estimator set ### `HLLP_OFFER` * Description: Add value to the set * Input: * hyperLogLogPlus - the hllp set * o - Object to add to the set * Returns: The HyperLogLogPlus set with a new object added **Note:** Added new library to metron-common pom and added 3 new items to dependencies_with_url.csv. **Testing** Spun up the Stellar REPL in quick-dev. And verified that the function composition is working as expected and returning correct cardinality estimates for simple sparse set cases. For example: ``` [Stellar]>>> HLLP_CARDINALITY(HLLP_MERGE( HLLP_OFFER(HLLP_OFFER(HLLP_INIT(5, 6), "runnings"), "cool"), HLLP_OFFER(HLLP_OFFER(HLLP_INIT(5, 6), "bobsled"), "team"))) 4 ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/mmiklavc/incubator-metron hyperloglog Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-metron/pull/397.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #397 ---- commit afce30539f6996a607e85d3fd35aac5fcb5c19aa Author: Michael Miklavcic <michael.miklav...@gmail.com> Date: 2016-12-15T20:55:39Z METRON-627: Add HyperLogLogPlus implementation to Stellar commit 414a3a98976b98a253ab9921720f02c8a7431da2 Author: Michael Miklavcic <michael.miklav...@gmail.com> Date: 2017-01-09T17:00:08Z work in progress commit commit c7f57a4acbb0ef357c1af9eaa263afea7bc83d9a Author: Michael Miklavcic <michael.miklav...@gmail.com> Date: 2017-01-11T16:58:58Z Merge with master commit 90d9659f415404c6c4682289c7bde669c352f517 Author: Michael Miklavcic <michael.miklav...@gmail.com> Date: 2017-01-12T20:33:10Z Refactor, fix statistics output commit 261e69651d4ae0b99e88e0e4a2c4e7568aa23fcb Author: Michael Miklavcic <michael.miklav...@gmail.com> Date: 2017-01-12T23:17:13Z METRON-627: Updated with sensible default precision values commit 9078094dd720d89f64ecf45506ab0c5077aa58a7 Author: Michael Miklavcic <michael.miklav...@gmail.com> Date: 2017-01-13T19:17:37Z METRON-627: Add default init for HLLP_ADD(null, 'val') commit 9e1ff937fe51841ac2fa3235bf87964cba8a1ae8 Author: Michael Miklavcic <michael.miklav...@gmail.com> Date: 2017-01-17T20:09:26Z Merge branch 'master' into hyperloglog commit d392f044e330fe273cb0f0b4ff820b4ef1a3595d Author: Michael Miklavcic <michael.miklav...@gmail.com> Date: 2017-01-17T20:33:11Z METRON-627: Fix Stellar lexer to handle newline at end ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---