[ 
https://issues.apache.org/jira/browse/METRON-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770923#comment-15770923
 ] 

ASF GitHub Bot commented on METRON-637:
---------------------------------------

Github user mattf-horton commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/401#discussion_r93683645
  
    --- Diff: 
metron-analytics/metron-statistics/src/test/java/org/apache/metron/statistics/StatisticalBinningPerformanceDriver.java
 ---
    @@ -0,0 +1,76 @@
    +/*
    + *
    + *  Licensed to the Apache Software Foundation (ASF) under one
    + *  or more contributor license agreements.  See the NOTICE file
    + *  distributed with this work for additional information
    + *  regarding copyright ownership.  The ASF licenses this file
    + *  to you under the Apache License, Version 2.0 (the
    + *  "License"); you may not use this file except in compliance
    + *  with the License.  You may obtain a copy of the License at
    + *
    + *      http://www.apache.org/licenses/LICENSE-2.0
    + *
    + *  Unless required by applicable law or agreed to in writing, software
    + *  distributed under the License is distributed on an "AS IS" BASIS,
    + *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
    + *  See the License for the specific language governing permissions and
    + *  limitations under the License.
    + *
    + */
    +package org.apache.metron.statistics;
    +
    +import com.google.common.collect.ImmutableList;
    +import org.apache.commons.math3.random.GaussianRandomGenerator;
    +import org.apache.commons.math3.random.MersenneTwister;
    +import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Random;
    +
    +/**
    + * This is a driver to drive evaluation of the performance characteristics 
of the STATS_BIN stellar function.
    + * It gets the distribution of the time it takes to calculate the bin of a 
million random numbers against the quintile bins
    + * of a statistical distribution of 10000 normally distributed reals 
between [-1000, 1000].
    + *
    + * On my 4 year old macbook pro, the values came out to be
    + *
    + * Min/25th/50th/75th/Max Milliseconds: 2687.0 / 2700.5 / 2716.0 / 2733.5 
/ 3730.0
    + */
    --- End diff --
    
    Great to have this.


> Add a STATS_BIN function to Stellar.
> ------------------------------------
>
>                 Key: METRON-637
>                 URL: https://issues.apache.org/jira/browse/METRON-637
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When passing parameters to models, it's often useful to pass the binned 
> representation of a variable based on an empirical statistical distribution, 
> rather than the actual variable.  This function should accept a set of 
> percentile bins and a statistical sketch and a value.  It should return the 
> index where the percentile of the value falls.
> For instance, consider the value 17 who is percentile 27.  If we use 25, 75, 
> 95 to define our bins, this function would return 1, because its percentile, 
> 27, is between 25 and 75.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to