[
https://issues.apache.org/jira/browse/STATISTICS-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739452#comment-17739452
]
Anirudh Joshi commented on STATISTICS-71:
-----------------------------------------
Based on the feedback, would the following be a better design ?
Define two interfaces DoubleStatistic and StatisticAccumulator<T extends
DoubleStatistic> as follows
{code:java}
// Base interface for all univariate statistics impl
public interface DoubleStatistic extends DoubleConsumer, DoubleSupplier {
Statistic getStatistic();
boolean isStoreless();
long getCount();
}{code}
{code:java}
// Contract for any univariate statistic to be used as a target for
Stream::collect i.e. combined with another instance of itself
public interface StatisticAccumulator<T extends DoubleStatistic> {
// To enforce that the parameter to combine function is bound to an
accumulator impl of the same statistic type T
<U extends StatisticAccumulator<T>> void combine(U other);
// Get the statistic we are trying to accumulate
T get();
}{code}
Have each of the Statistic implement both the interfaces. E.g.
{code:java}
public abstract class Min implements DoubleStatistic, StatisticAccumulator<Min>
{
public static Min storeless() {
return new StorelessMin();
}
public abstract Min of(double... values);
@Override
public Statistic getStatistic() {
return Statistic.Min;
}
private final static class StorelessMin extends Min {
private double min;
private final Count count;
StorelessMin() {
this.min = Double.NaN;
this.count = Count.newCount();
}
@Override
public boolean isStoreless() {
return true;
}
@Override
public Min of(double... values) {
Arrays.stream(values).forEach(this::accept);
return this;
}
@Override
public void accept(double d) {
if (d < min || Double.isNaN(min)) {
min = d;
}
count.increment();
}
@Override
public long getCount() {
return count.getCount();
}
@Override
public double getAsDouble() {
return min;
}
@Override
public <U extends StatisticAccumulator<Min>> void combine(U other) {
Min otherMin = other.get();
this.min = Double.min(this.min, otherMin.getAsDouble());
this.count.setCount(this.getCount() + otherMin.getCount());
}
@Override
public StorelessMin get() {
return this;
}
} {code}
Possible usages would look like
{code:java}
double min = Min.storeless().of(1.0, 2.0, 3.0).getAsDouble();
double min = Arrays.stream(new double[]{1.0, 2.0, 3.0, 4.0, -1.0})
.collect(Min::storeless, Min::accept, Min::combine)
.getAsDouble();{code}
Would like to know if it aligns with our common goal of
- minimal interface for univariate statistics
- reuse JDK interfaces and stick to JDK naming as much as possible
- restrict attempts of meaningless combinations of individual statistics
> Implementation of Univariate Statistics
> ---------------------------------------
>
> Key: STATISTICS-71
> URL: https://issues.apache.org/jira/browse/STATISTICS-71
> Project: Commons Statistics
> Issue Type: Task
> Components: descriptive
> Reporter: Anirudh Joshi
> Priority: Minor
> Labels: gsoc, gsoc2023
>
> Jira ticket to track the implementation of the Univariate statistics required
> for the updated SummaryStatistics API.
> The implementation would be "storeless". It should be used for calculating
> statistics that can be computed in one pass through the data without storing
> the sample values.
> Currently I have the definition of API as (this might evolve as I continue
> working)
> {code:java}
> public interface DoubleStorelessUnivariateStatistic extends DoubleSupplier {
> DoubleStorelessUnivariateStatistic add(double v);
> long getCount();
> void combine(DoubleStorelessUnivariateStatistic other);
> } {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)