[ 
https://issues.apache.org/jira/browse/METRON-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769391#comment-15769391
 ] 

ASF GitHub Bot commented on METRON-637:
---------------------------------------

Github user mattf-horton commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/401#discussion_r93574334
  
    --- Diff: 
metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/StellarStatisticsFunctions.java
 ---
    @@ -425,4 +428,61 @@ public Object apply(List<Object> args) {
           return result;
         }
       }
    +
    +  /**
    +   * Calculates the statistical bin that a value falls in.
    +   */
    +  @Stellar(namespace = "STATS", name = "BIN"
    +          , description = "Computes the bin that the value is in based on 
the statistical distribution."
    +          , params = {
    +          "stats - The Stellar statistics object"
    +          , "value - The value to bin"
    +          , "bounds? - A list of percentile bin bounds (excluding min and 
max) or a string representing a known and common set of bins.  " +
    +          "For convenience, we have provided QUARTILE, QUINTILE, and 
DECILE which you can pass in as a string arg." +
    +          " If this argument is omitted, then we assume a Quartile bin 
split."
    +                    }
    +          ,returns = "Which bin N the value falls in such that bound(N-1) 
< value <= bound(N). " +
    +          "No min and max bounds are provided, so values smaller than the 
0'th bound go in the 0'th bin, " +
    +          "and values greater than the last bound go in the M'th bin."
    +  )
    +  public static class StatsBin extends BaseStellarFunction {
    +    public enum BinSplits {
    +      QUARTILE(ImmutableList.of(25.0, 50.0, 75.0)),
    +      QUINTILE(ImmutableList.of(20.0, 40.0, 60.0, 80.0)),
    +      DECILE(ImmutableList.of(10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 
80.0, 90.0))
    +      ;
    +      public final List<Double> split;
    +      BinSplits(List<Double> split) {
    +        this.split = split;
    +      }
    +
    +      public static List<Double> getSplit(Object o) {
    +        if(o instanceof String) {
    +          return BinSplits.valueOf((String)o).split;
    +        }
    +        else if(o instanceof List) {
    +          List<Double> ret = new ArrayList<>();
    +          for(Object valO : (List<Object>)o) {
    +            ret.add(ConversionUtils.convert(valO, Double.class));
    --- End diff --
    
    During this step we need to validate that the bounds list is strictly 
increasing.


> Add a STATS_BIN function to Stellar.
> ------------------------------------
>
>                 Key: METRON-637
>                 URL: https://issues.apache.org/jira/browse/METRON-637
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When passing parameters to models, it's often useful to pass the binned 
> representation of a variable based on an empirical statistical distribution, 
> rather than the actual variable.  This function should accept a set of 
> percentile bins and a statistical sketch and a value.  It should return the 
> index where the percentile of the value falls.
> For instance, consider the value 17 who is percentile 27.  If we use 25, 75, 
> 95 to define our bins, this function would return 1, because its percentile, 
> 27, is between 25 and 75.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to