[ 
https://issues.apache.org/jira/browse/METRON-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768214#comment-15768214
 ] 

ASF GitHub Bot commented on METRON-637:
---------------------------------------

Github user mattf-horton commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/401#discussion_r93513473
  
    --- Diff: metron-analytics/metron-statistics/README.md ---
    @@ -112,6 +112,13 @@ functions can be used from everywhere where Stellar is 
used.
       * Input:
         * stats - The Stellar statistics object
       * Returns: The variance of the values in the window or NaN if the 
statistics object is null.
    +* `STATS_BIN`
    +  * Description: Computes the bin that the value is in based on the 
statistical distribution. 
    +  * Input:
    +    * stats - The Stellar statistics object
    +    * value - The value to bin
    +    * range? - A list of percentile bin ranges (excluding min and max) or 
a string representing a known and common set of bins.  For convenience, we have 
provided QUARTILE, QUINTILE, and DECILE which you can pass in as a string arg. 
If this argument is omitted, then we assume a Quartile bin split.
    --- End diff --
    
    Would it make sense, as an option, to also allow binning by raw value 
instead of percentile?  Same format, again excluding min and max.  Could then 
apply to any Comparable field.


> Add a STATS_BIN function to Stellar.
> ------------------------------------
>
>                 Key: METRON-637
>                 URL: https://issues.apache.org/jira/browse/METRON-637
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When passing parameters to models, it's often useful to pass the binned 
> representation of a variable based on an empirical statistical distribution, 
> rather than the actual variable.  This function should accept a set of 
> percentile bins and a statistical sketch and a value.  It should return the 
> index where the percentile of the value falls.
> For instance, consider the value 17 who is percentile 27.  If we use 25, 75, 
> 95 to define our bins, this function would return 1, because its percentile, 
> 27, is between 25 and 75.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to