[ 
https://issues.apache.org/jira/browse/STATISTICS-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772553#comment-17772553
 ] 

Alex Herbert commented on STATISTICS-71:
----------------------------------------

Remaining statistics from Commons Math:
 * Geometric Mean
 * Sum of Logs
 * Sum of squares
 * Product

These are all trivial to implement.

There is also PSquarePercentile. It may not be possible to implement a combine 
algorithm for this statistic.

Before adding further statistics I intend to redesign the test classes. All of 
the current tests following the same general boiler plate:
 * Test an empty statistic
 * Test adding values to a statistic
 * Test creating a statistic from an array of values
 * Test merging two statistics created from two sets of values
 * Test merging two statistics created from two arrays
 * Test a statistic with non-finite values
 * Test the same values in a randomized order

Each test class only differs in the method used to compute the expected result, 
and the test tolerance.

The tests rely on being able to compute the expected value. For simple 
statistics this is trivial. For the Kurtosis and Skewness this is more 
involved. This can lead to error due to testing an implementation with a 
different implementation in the same codebase.  Currently there is no support 
to input example data and an expected result generated from other reference 
implementations. This is critical to add before the code is released.

 

> Implementation of Univariate Statistics
> ---------------------------------------
>
>                 Key: STATISTICS-71
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-71
>             Project: Commons Statistics
>          Issue Type: Task
>          Components: descriptive
>            Reporter: Anirudh Joshi
>            Assignee: Anirudh Joshi
>            Priority: Minor
>              Labels: gsoc, gsoc2023
>
> Jira ticket to track the implementation of the Univariate statistics required 
> for the updated SummaryStatistics API. 
> The implementation would be "storeless". It should be used for calculating 
> statistics that can be computed in one pass through the data without storing 
> the sample values.
> Currently I have the definition of API as (this might evolve as I continue 
> working)
> {code:java}
> public interface DoubleStorelessUnivariateStatistic extends DoubleSupplier {
>     DoubleStorelessUnivariateStatistic add(double v);
>     long getCount();
>     void combine(DoubleStorelessUnivariateStatistic other);
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to