[ 
https://issues.apache.org/jira/browse/METRON-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768242#comment-15768242
 ] 

ASF GitHub Bot commented on METRON-637:
---------------------------------------

Github user mattf-horton commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/401#discussion_r93524927
  
    --- Diff: metron-analytics/metron-statistics/README.md ---
    @@ -112,6 +112,13 @@ functions can be used from everywhere where Stellar is 
used.
       * Input:
         * stats - The Stellar statistics object
       * Returns: The variance of the values in the window or NaN if the 
statistics object is null.
    +* `STATS_BIN`
    +  * Description: Computes the bin that the value is in based on the 
statistical distribution. 
    +  * Input:
    +    * stats - The Stellar statistics object
    +    * value - The value to bin
    +    * range? - A list of percentile bin ranges (excluding min and max) or 
a string representing a known and common set of bins.  For convenience, we have 
provided QUARTILE, QUINTILE, and DECILE which you can pass in as a string arg. 
If this argument is omitted, then we assume a Quartile bin split.
    --- End diff --
    
    It seems like it would be barely more effort than writing another 
StellarFunction wrapper, but maybe I'm being optimistic again.  I would support 
including both here, but if you prefer to separate them I'm fine with that.


> Add a STATS_BIN function to Stellar.
> ------------------------------------
>
>                 Key: METRON-637
>                 URL: https://issues.apache.org/jira/browse/METRON-637
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When passing parameters to models, it's often useful to pass the binned 
> representation of a variable based on an empirical statistical distribution, 
> rather than the actual variable.  This function should accept a set of 
> percentile bins and a statistical sketch and a value.  It should return the 
> index where the percentile of the value falls.
> For instance, consider the value 17 who is percentile 27.  If we use 25, 75, 
> 95 to define our bins, this function would return 1, because its percentile, 
> 27, is between 25 and 75.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to