On 9/29/07, Phil Steitz <[EMAIL PROTECTED]> wrote: > > On 9/22/07, Bradford Cross <[EMAIL PROTECTED]> > > > the Smart updates are a key feature for event stream processing / time > > series simulation. The only piece that is missing from a time series > > analysis and simulation perspective is the ability to supply a lag that > > defines a fixed sample size and perform rolling calculations. > > > > That functionality actually already exists in the > DescriptiveStatistics class. You can set a "window size" for rolling > computations of univariate statistics using the concrete > implementation of this class, > o.a.c.math.stat.descriptive.DescriptiveStatisticsImpl. See > http://commons.apache.org/math/userguide/stat.html
cool - i did not see this yet. > > > > If the community is OK with this initial spike, then we can start > submitting > > patches. :-) > > > > Thanks for the contribution! There are a few problems with > incorporating the code as is, though. First it uses generics and the > concurrent package, which requires JDK 1.5 and our current minimum JDK > level is 1.3. That could probably be eliminated fairly easily, > though. The second is really whether or not the queue implementation > is going to improve performance over the ResizeableDoubleArray store > that DescriptiveStatisticsImpl uses now. If you think so and can > demonstrate with benchmarks, we can talk about swapping out that > implementation. Otherwise, its probably better to use > ResizeableDoubleArray. Yes, this is just a "spike" - a proof of concept. :-) Today I setup a benchmark test and swapped in lots of different collections. The fastest java queue I found is the ArrayDeque from java-6. Interestingly, the calculations are about twice as fast using this queue compared with some of the other queue implementations in the java collections for a run of about 10K calls to calculate(). Nevertheless, the ResizableDoubleArray seems to be a bit faster. I will formalize my benchmarking with a bit more rigor and publish the results on this thread. I am +1 on adding a RollingStatistic abstract base class (would prefer > that name to "Statistic" since it is specialized) like you have > defined and rolling versions of the individual statistics. This would > be a convenience over the current setup and provide a more intuitive > way to access rolling stats than to use DescriptiveStatisticsImpl as a > container. Currently this is only the only way to do it. So if you > can refactor to either use ResizableDoubleArray as the backing store > (look at DescriptiveStatisticsImpl.apply - the convenience classes > could just use that pattern) or otherwise eliminate the JDK 1.5 > dependency, I would support adding the rolling stats. If I understand > correctly the idea of what you mean by Sum, and Mean (using > constructor arguments to determine whether or not statistic is > rolling), I would prefer to leave the existing statistics in > commons-math as is and introduce Rolling versions as separate classes. Sounds good - I will start working on the RollingStatistics. As for the convience pattern I used in Mean/Sum (using constructor arguments to determine whether or not statistic is rolling) - it is easy to do the refactorings later after the rolling statistics are added. We can just leave the current statistics as is and wait to see if we find some valuable reason to do it. One more thing. It is very important that any contributions that you > make can be made in accordance with the Apache Contributor's License > Agreement. Have a look here: > http://www.apache.org/licenses/#clas > and make sure you can agree to those terms. Yep, no problems. :-) Then you can start > submitting patches with attachements to Jira tickets. Sounds good.