[
https://issues.apache.org/jira/browse/MATH-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651755#action_12651755
]
Phil Steitz commented on MATH-224:
----------------------------------
After looking carefully at the patch and the higher moments problem, I am now
leaning toward WONTFIX for this. The problem is that the "storeless"
statistics are only required to store enough data to support updates via single
value increments. Adding the requirement to support aggregation in the sense
defined here places an unacceptable limitation on the implementing classes.
The second moment and variance, for example, only work now because the default
implementations carry along nested first moments. This setup is a little
awkward and might be changed; but then it would not be obvious how to support
aggregation. It is not obvious to me how to support this for fourth moments at
all. In any case, I think it is too restrictive a requirement to place on
implementations, so if we do support this, it should be via a subclass of
SummaryStatistics.
> Utility method to aggregate Statistics
> --------------------------------------
>
> Key: MATH-224
> URL: https://issues.apache.org/jira/browse/MATH-224
> Project: Commons Math
> Issue Type: Improvement
> Reporter: Andre Panisson
> Assignee: Phil Steitz
> Priority: Minor
> Fix For: 2.0
>
> Attachments: commons_math.patch
>
>
> Below is the conversation related to this topic that was posted to the
> Commons Users group.
> -------------------------------------------------
> Hi,
> >
> > I'm writing a complex validation algorithm, that makes a K-Fold
> > cross-validation using a data set. The data set is partitioned into K
> > subsamples, and of the K subsamples, a single subsample is retained
> > as the validation data for testing, and the remaining K − 1
> > subsamples are used as training data. The process is then repeated K
> > times, and at the end the K results are aggregated to a single
> > result. The problem is that all K results return Statistics objects
> > (org.apache.commons.math.stat.descriptive.SummaryStatistics), and I
> > need to make the aggregation of all K objects in a single Statistics.
> > I think it is a common problem in the statistics field. There's
> > anyone who had already implemented an utility method to do it?
> There is no such feature currently in commons-math. The
> SummaryStatistics class wraps a bunch of specialized statistics classes
> (Sum, Mean, Max, SumOfSquares ...) which can be overriden by
> user-provided StorelessUnivariateStatistic implementations.
> So this feature should be added to the StorelessUnivariateStatistic
> interface and all its implementations, with a signature like this:
> public void aggregate(StorelessUnivariateStatistic otherStatistic);
> The implementation of this method should only use the
> StorelessUnivariateStatistic methods, i.e. getResult() and getN(). This
> seems feasible for the statistics used by SummaryStatistics, but has not
> been done yet.
> One should be aware that SummaryStatistics does not enforce strong
> typing, so one could call aggregate on a Sum instance and provide it a
> Min instance, which would of course result in meaningless results.
> > Or maybe it would be interesting to request it as an Improvement to
> > the Commons Math developers, adding an "aggregator" to all Statistics
> > implementations?
> If you want to request this improvement, please open a ticket for it
> using our JIRA tracking system:
> http://issues.apache.org/jira/browse/MATH. You'll have to register to be
> able to add your feature request. You can also provide a patch if you
> want to contribute it by yourself.
> Luc
> >
> > Thanks in advance,
> >
> > Andre Panisson
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.