[
https://issues.apache.org/jira/browse/METRON-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768226#comment-15768226
]
ASF GitHub Bot commented on METRON-637:
---------------------------------------
Github user cestella commented on a diff in the pull request:
https://github.com/apache/incubator-metron/pull/401#discussion_r93523813
--- Diff: metron-analytics/metron-statistics/README.md ---
@@ -112,6 +112,13 @@ functions can be used from everywhere where Stellar is
used.
* Input:
* stats - The Stellar statistics object
* Returns: The variance of the values in the window or NaN if the
statistics object is null.
+* `STATS_BIN`
+ * Description: Computes the bin that the value is in based on the
statistical distribution.
+ * Input:
+ * stats - The Stellar statistics object
+ * value - The value to bin
+ * range? - A list of percentile bin ranges (excluding min and max) or
a string representing a known and common set of bins. For convenience, we have
provided QUARTILE, QUINTILE, and DECILE which you can pass in as a string arg.
If this argument is omitted, then we assume a Quartile bin split.
--- End diff --
It would, for sure, but probably not called `STATS_BIN`. I was thinking of
adding a proper `BIN` function as a follow-on and refactoring this one to use
it. Or do you think we should just bite the bullet and create it all here?
> Add a STATS_BIN function to Stellar.
> ------------------------------------
>
> Key: METRON-637
> URL: https://issues.apache.org/jira/browse/METRON-637
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> When passing parameters to models, it's often useful to pass the binned
> representation of a variable based on an empirical statistical distribution,
> rather than the actual variable. This function should accept a set of
> percentile bins and a statistical sketch and a value. It should return the
> index where the percentile of the value falls.
> For instance, consider the value 17 who is percentile 27. If we use 25, 75,
> 95 to define our bins, this function would return 1, because its percentile,
> 27, is between 25 and 75.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)