[jira] [Commented] (STATISTICS-71) Implementation of Univariate Statistics

Anirudh Joshi (Jira) Sun, 02 Jul 2023 23:26:07 -0700


    [ 
https://issues.apache.org/jira/browse/STATISTICS-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739452#comment-17739452
 ]


Anirudh Joshi commented on STATISTICS-71:
-----------------------------------------

Based on the feedback, would the following be a better design ?

Define two interfaces DoubleStatistic and StatisticAccumulator<T extends 
DoubleStatistic> as follows
{code:java}
// Base interface for all univariate statistics impl
public interface DoubleStatistic extends DoubleConsumer, DoubleSupplier {
    Statistic getStatistic();
    boolean isStoreless();
    long getCount();
}{code}
 

 
{code:java}
// Contract for any univariate statistic to be used as a target for 
Stream::collect i.e. combined with another instance of itself
public interface StatisticAccumulator<T extends DoubleStatistic> {
    // To enforce that the parameter to combine function is bound to an 
accumulator impl of the same statistic type T     
    <U extends StatisticAccumulator<T>> void combine(U other);

    // Get the statistic we are trying to accumulate
    T get();
}{code}
Have each of the Statistic implement both the interfaces. E.g.
{code:java}
public abstract class Min implements DoubleStatistic, StatisticAccumulator<Min> 
{

    public static Min storeless() {
        return new StorelessMin();
    }

    public abstract Min of(double... values);

    @Override
    public Statistic getStatistic() {
        return Statistic.Min;
    }

    private final static class StorelessMin extends Min {
        private double min;
        private final Count count;

        StorelessMin() {
            this.min = Double.NaN;
            this.count = Count.newCount();
        }

        @Override
        public boolean isStoreless() {
            return true;
        }

        @Override
        public Min of(double... values) {
            Arrays.stream(values).forEach(this::accept);
            return this;
        }

        @Override
        public void accept(double d) {
            if (d < min || Double.isNaN(min)) {
                min = d;
            }
            count.increment();
        }

        @Override
        public long getCount() {
            return count.getCount();
        }

        @Override
        public double getAsDouble() {
            return min;
        }

        @Override
        public <U extends StatisticAccumulator<Min>> void combine(U other) {
            Min otherMin = other.get();
            this.min = Double.min(this.min, otherMin.getAsDouble());
            this.count.setCount(this.getCount() + otherMin.getCount());
        }

        @Override
        public StorelessMin get() {
            return this;
        }
    } {code}
 

 

 

Possible usages would look like 
{code:java}
double min = Min.storeless().of(1.0, 2.0, 3.0).getAsDouble(); 

double min = Arrays.stream(new double[]{1.0, 2.0, 3.0, 4.0, -1.0})
        .collect(Min::storeless, Min::accept, Min::combine)
        .getAsDouble();{code}

Would like to know if it aligns with our common goal of 
- minimal interface for univariate statistics
- reuse JDK interfaces and stick to JDK naming as much as possible
- restrict attempts of meaningless combinations of individual statistics

 

> Implementation of Univariate Statistics
> ---------------------------------------
>
>                 Key: STATISTICS-71
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-71
>             Project: Commons Statistics
>          Issue Type: Task
>          Components: descriptive
>            Reporter: Anirudh Joshi
>            Priority: Minor
>              Labels: gsoc, gsoc2023
>
> Jira ticket to track the implementation of the Univariate statistics required 
> for the updated SummaryStatistics API. 
> The implementation would be "storeless". It should be used for calculating 
> statistics that can be computed in one pass through the data without storing 
> the sample values.
> Currently I have the definition of API as (this might evolve as I continue 
> working)
> {code:java}
> public interface DoubleStorelessUnivariateStatistic extends DoubleSupplier {
>     DoubleStorelessUnivariateStatistic add(double v);
>     long getCount();
>     void combine(DoubleStorelessUnivariateStatistic other);
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (STATISTICS-71) Implementation of Univariate Statistics

Reply via email to