[
https://issues.apache.org/jira/browse/METRON-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754946#comment-15754946
]
ASF GitHub Bot commented on METRON-627:
---------------------------------------
Github user mmiklavc commented on a diff in the pull request:
https://github.com/apache/incubator-metron/pull/397#discussion_r92847280
--- Diff:
metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/DataStructureFunctions.java
---
@@ -137,6 +138,105 @@ public Object apply(List<Object> args) {
}
}
+ @Stellar( namespace="HLLP"
+ , name="CARDINALITY"
+ , description="Returns HyperLogLogPlus-estimated cardinality for
this set"
+ , params = { "hyperLogLogPlus - the hllp set" }
+ , returns = "Long value representing the cardinality for this
set"
+ )
+ public static class HLLPCardinality extends BaseStellarFunction {
+
+ @Override
+ public Object apply(List<Object> args) {
+ if (args.size() < 1) {
+ throw new IllegalArgumentException("Must pass an hllp set to get
the cardinality for");
+ }
+ return ((HyperLogLogPlus) args.get(0)).cardinality();
+ }
+ }
+
+ @Stellar( namespace="HLLP"
+ , name="INIT"
+ , description="Initializes the set"
+ , params = {
+ "p (required) - the precision value for the normal
set"
+ ,"sp - the precision value for the sparse set. If sp
is not specified the sparse set will be disabled."
+ }
+ , returns = "A new HyperLogLogPlus set"
+ )
+ public static class HLLPInit extends BaseStellarFunction {
+
+ @Override
+ public Object apply(List<Object> args) {
+ if (args.size() == 0) {
+ throw new IllegalArgumentException("Normal set precision is
required");
+ } else if (args.size() == 1) {
+ int p = ConversionUtils.convert(args.get(0), Integer.class);
+ return new HyperLogLogPlus(p);
+ } else {
+ int p = ConversionUtils.convert(args.get(0), Integer.class);
+ int sp = ConversionUtils.convert(args.get(1), Integer.class);
+ return new HyperLogLogPlus(p, sp);
+ }
+ }
+ }
+
+ @Stellar( namespace="HLLP"
+ , name="MERGE"
+ , description="Merge hllp sets together"
+ , params = {
+ "hllp1 - first hllp set"
--- End diff --
Agreed.
> Add HyperLogLogPlus implementation to Stellar
> ---------------------------------------------
>
> Key: METRON-627
> URL: https://issues.apache.org/jira/browse/METRON-627
> Project: Metron
> Issue Type: Improvement
> Reporter: Michael Miklavcic
>
> Calculating set cardinality can be a useful tool for a security analyst. For
> instance, a large volume of non-unique src ip addresses hitting your network
> may be an indication that you are currently under attack. There have been
> many advancements in distinct value (DV) estimation over the years. We have
> seen implementations evolve from K-Minimum-Values (KMV), to LogLog, to
> HyperLogLog, and now to Google's much-improved HyperLogLogPlu algorithm. The
> key improvements in this latest manifestation of the algorithm are:
> moves to a 64-bit hash
> handles sparse sets
> is more accurate with small cardinality
> This Jira tracks the effort to add a HyperLogLogPlus implementation to Metron.
> References:
> https://research.neustar.biz/2013/01/24/hyperloglog-googles-take-on-engineering-hll/
> http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40671.pdf
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)