[jira] [Commented] (METRON-627) Add HyperLogLogPlus implementation to Stellar

ASF GitHub Bot (JIRA) Fri, 16 Dec 2016 08:04:06 -0800

    [ 
https://issues.apache.org/jira/browse/METRON-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754809#comment-15754809
 ]


ASF GitHub Bot commented on METRON-627:
---------------------------------------

Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92835008
  
    --- Diff: 
metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/DataStructureFunctions.java
 ---
    @@ -137,6 +138,105 @@ public Object apply(List<Object> args) {
         }
       }
     
    +  @Stellar( namespace="HLLP"
    +          , name="CARDINALITY"
    +          , description="Returns HyperLogLogPlus-estimated cardinality for 
this set"
    +          , params = { "hyperLogLogPlus - the hllp set" }
    +          , returns = "Long value representing the cardinality for this 
set"
    +  )
    +  public static class HLLPCardinality extends BaseStellarFunction {
    +
    +    @Override
    +    public Object apply(List<Object> args) {
    +      if (args.size() < 1) {
    +        throw new IllegalArgumentException("Must pass an hllp set to get 
the cardinality for");
    +      }
    +      return ((HyperLogLogPlus) args.get(0)).cardinality();
    +    }
    +  }
    +
    +  @Stellar( namespace="HLLP"
    +          , name="INIT"
    +          , description="Initializes the set"
    +          , params = {
    +                      "p (required) - the precision value for the normal 
set"
    --- End diff --
    
     Are their any documents that we can link to describe the tradeoffs for 
these values?  I'm thinking of something like 
[here](https://en.wikipedia.org/wiki/File:Bloom_filter_fp_probability.svg) 
discussing the tradeoff between accuracy and size.


> Add HyperLogLogPlus implementation to Stellar
> ---------------------------------------------
>
>                 Key: METRON-627
>                 URL: https://issues.apache.org/jira/browse/METRON-627
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Michael Miklavcic
>
> Calculating set cardinality can be a useful tool for a security analyst. For 
> instance, a large volume of non-unique src ip addresses hitting your network 
> may be an indication that you are currently under attack. There have been 
> many advancements in distinct value (DV) estimation over the years. We have 
> seen implementations evolve from K-Minimum-Values (KMV), to LogLog, to 
> HyperLogLog, and now to Google's much-improved HyperLogLogPlu algorithm. The 
> key improvements in this latest manifestation of the algorithm are:
> moves to a 64-bit hash
> handles sparse sets
> is more accurate with small cardinality
> This Jira tracks the effort to add a HyperLogLogPlus implementation to Metron.
> References:
> https://research.neustar.biz/2013/01/24/hyperloglog-googles-take-on-engineering-hll/
> http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40671.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (METRON-627) Add HyperLogLogPlus implementation to Stellar

Reply via email to