[ https://issues.apache.org/jira/browse/METRON-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754933#comment-15754933 ]
ASF GitHub Bot commented on METRON-627: --------------------------------------- Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/incubator-metron/pull/397#discussion_r92846304 --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/DataStructureFunctions.java --- @@ -137,6 +138,105 @@ public Object apply(List<Object> args) { } } + @Stellar( namespace="HLLP" + , name="CARDINALITY" + , description="Returns HyperLogLogPlus-estimated cardinality for this set" + , params = { "hyperLogLogPlus - the hllp set" } + , returns = "Long value representing the cardinality for this set" + ) + public static class HLLPCardinality extends BaseStellarFunction { + + @Override + public Object apply(List<Object> args) { + if (args.size() < 1) { + throw new IllegalArgumentException("Must pass an hllp set to get the cardinality for"); + } + return ((HyperLogLogPlus) args.get(0)).cardinality(); + } + } + + @Stellar( namespace="HLLP" + , name="INIT" + , description="Initializes the set" + , params = { + "p (required) - the precision value for the normal set" --- End diff -- There are some diagrams. Think it's sufficient to link in the description, or should we consider embedding these images right into the README? > Add HyperLogLogPlus implementation to Stellar > --------------------------------------------- > > Key: METRON-627 > URL: https://issues.apache.org/jira/browse/METRON-627 > Project: Metron > Issue Type: Improvement > Reporter: Michael Miklavcic > > Calculating set cardinality can be a useful tool for a security analyst. For > instance, a large volume of non-unique src ip addresses hitting your network > may be an indication that you are currently under attack. There have been > many advancements in distinct value (DV) estimation over the years. We have > seen implementations evolve from K-Minimum-Values (KMV), to LogLog, to > HyperLogLog, and now to Google's much-improved HyperLogLogPlu algorithm. The > key improvements in this latest manifestation of the algorithm are: > moves to a 64-bit hash > handles sparse sets > is more accurate with small cardinality > This Jira tracks the effort to add a HyperLogLogPlus implementation to Metron. > References: > https://research.neustar.biz/2013/01/24/hyperloglog-googles-take-on-engineering-hll/ > http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40671.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)