[jira] [Commented] (STATISTICS-71) Implementation of Univariate Statistics

Alex Herbert (Jira) Mon, 03 Jul 2023 00:30:08 -0700


    [ 
https://issues.apache.org/jira/browse/STATISTICS-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739479#comment-17739479
 ]


Alex Herbert commented on STATISTICS-71:
----------------------------------------

I thought we dropped the requirement for:
{code:java}
long getCount();
{code}
I do not see the need for:
{code:java}
boolean isStoreless();
{code}
Why does the end-user care? What is the function of identifying a statistic as 
storeless? The method is trivial so adding it later would be simple given it 
could be a default method in the interface (return true). Perhaps we leave this 
until implementations are added that are not storeless.

The difference for a stored statistic is that it maintains an array (or other) 
of all observed values as the computation requires all input (e.g. median). If 
this does get implemented then a stored statistic could provide access to the 
values via a child interface, e.g.
{code:java}
public interface StoredDoubleStatistic extends DoubleStatistic {
    DoubleStream streamValues();
}
{code}
One issue with the Accumulator is that if you do implement stored statistics 
then you can combine any stored statistic with another. The type will not 
matter and opens the possibility that the implementation can merge the 
underlying storage and share it between the two. This may require merge methods 
specifically for stored statistics.

I am wondering if this is required:

 
{code:java}
Statistic getStatistic();
{code}
I do not think there is a use case for having an instance of a DoubleStatistic, 
not knowing what it is and having to query it. If it is to help building a 
combiner of statistics dynamically then this is an implementation detail and 
not part of the public API.

If you remove the count and storeless flag (for now) you are closer to a 
minimal API. If you remove the getStatistic method then you are left with 
nothing in the DoubleStatistic interface and it becomes a combiner of JDK APIs. 
This is truly minimal and a point to start for an implementation since the 
methods are fixed by the JDK and so rewrites will not be necessary as 
development progress and reveals additional requirements.

Notes:
 * The {{of}} method is a factory constructor and should return a new instance. 
Your implementation is more like {{{}add{}}}.
 * This code does not distinguish -0.0 and 0.0: {{{}if (d < min){}}}. As such 
you have the possibility for multiple implementations of Min. E.g. One using 
the less than operator and one using {{{}Double.min{}}}.
 * Naming conventions: {{{}Statistic.Min => Statistic.MIN{}}}.

> Implementation of Univariate Statistics
> ---------------------------------------
>
>                 Key: STATISTICS-71
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-71
>             Project: Commons Statistics
>          Issue Type: Task
>          Components: descriptive
>            Reporter: Anirudh Joshi
>            Priority: Minor
>              Labels: gsoc, gsoc2023
>
> Jira ticket to track the implementation of the Univariate statistics required 
> for the updated SummaryStatistics API. 
> The implementation would be "storeless". It should be used for calculating 
> statistics that can be computed in one pass through the data without storing 
> the sample values.
> Currently I have the definition of API as (this might evolve as I continue 
> working)
> {code:java}
> public interface DoubleStorelessUnivariateStatistic extends DoubleSupplier {
>     DoubleStorelessUnivariateStatistic add(double v);
>     long getCount();
>     void combine(DoubleStorelessUnivariateStatistic other);
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (STATISTICS-71) Implementation of Univariate Statistics

Reply via email to