[
https://issues.apache.org/jira/browse/METRON-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771189#comment-15771189
]
ASF GitHub Bot commented on METRON-637:
---------------------------------------
Github user cestella commented on a diff in the pull request:
https://github.com/apache/incubator-metron/pull/401#discussion_r93702246
--- Diff:
metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/MathFunctions.java
---
@@ -59,4 +60,45 @@ public boolean isInitialized() {
return true;
}
}
+
+ /**
+ * Calculates the statistical bin that a value falls in.
+ */
+ @Stellar(name = "BIN"
+ , description = "Computes the bin that the value is in given a
set of bounds."
+ , params = {
+ "value - The value to bin"
+ , "bounds - A list of value bounds (excluding min and max) in
sorted order."
+ }
+ ,returns = "Which bin N the value falls in such that bound(N-1)
< value <= bound(N). " +
+ "No min and max bounds are provided, so values smaller than the
0'th bound go in the 0'th bin, " +
+ "and values greater than the last bound go in the M'th bin."
+ )
+ public static class Bin extends BaseStellarFunction {
+
+ public static int getBin(double value, int numBins, Function<Integer,
Double> boundFunc) {
+ double lastBound = Long.MIN_VALUE;
+ for(int bin = 0; bin < numBins;++bin) {
+ double bound = boundFunc.apply(bin);
+ if(bound < lastBound) {
+ throw new IllegalStateException("Your bins must be monotonically
increasing");
--- End diff --
You're right, strictly increasing is correct.
> Add a STATS_BIN function to Stellar.
> ------------------------------------
>
> Key: METRON-637
> URL: https://issues.apache.org/jira/browse/METRON-637
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> When passing parameters to models, it's often useful to pass the binned
> representation of a variable based on an empirical statistical distribution,
> rather than the actual variable. This function should accept a set of
> percentile bins and a statistical sketch and a value. It should return the
> index where the percentile of the value falls.
> For instance, consider the value 17 who is percentile 27. If we use 25, 75,
> 95 to define our bins, this function would return 1, because its percentile,
> 27, is between 25 and 75.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)