--- "Mark R. Diggory" <[EMAIL PROTECTED]> wrote:
>
>
> > This adds
> > significant overhead and I do not see the value in it. The cost of the
> > additional stack operations/object creations is significant. I ran tests
> > comparing the previous version that does direct computations using the
> double[]
> > arrays to the modified version and found an average of more than 6x
> slowdown
> > using the new implementation. I did not profile memory utilization, but
> that is
> > also a concern. Repeated tests computing the mean of a 1000 doubles 100000
> > times using the old and new implementations averaged 1.5 and 10.2 seconds,
> > resp. I do not see the need for all of this additional overhead.
> >
>
> If you review the code, you'll find there is no added "object creation",
> the static Variable objects calculate on double[] just as the
> Univariates did, I would have to see more substantial analysis to
> believe your claim. All thats going on here are that the Static StatUtil
> methods are delegating to individual static instances of
> UnivariateStatistics. These are instantiated on JVM startup like all
> static objects, calling a method in such an object should not require
> any more overhead than having the method coded directly into the static
> method.
Here is what I added to one of the methods in StatUtilsTest, after copying and
renaming the old version OStatUtils:
for (int j = 0; j < 10; j++) {
startTick = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
res = OStatUtils.mean(x);
System.out.println("old: " + (System.currentTimeMillis() - startTick));
startTick = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
res = StatUtils.mean(x);
}
//newStats.addValue(System.currentTimeMillis() - startTick);
System.out.println("new: " + (System.currentTimeMillis() -
startTick));
}for (int j = 0; j < 10; j++) {
startTick = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
res = OStatUtils.mean(x);
}
System.out.println("old: " + (System.currentTimeMillis() -
startTick));
//oldStats.addValue(System.currentTimeMillis() - startTick);
startTick = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
res = StatUtils.mean(x);
}
//newStats.addValue(System.currentTimeMillis() - startTick);
System.out.println("new: " + (System.currentTimeMillis() -
startTick));
>
> If there are performance considerations, lets discuss these.
>
> I doubt (as the numerous discussions over the past week have pointed
> out) that what we really want to have in StatUtils is one monolithic
> Static class with all the implemented methods present in it. If I have
> misinterpreted this opinion in the group, then I'm sure there will be
> responses to this.
>
> > I suggest that we postpone introduction of a statistical computation
> framework
> > until after the initial release, if needed. In any case, I would like to
> keep
> > StatUtils and the core UnivariateImpl small, fast and lightweight, so I
> would
> > like to request that the changes to these classes be rolled back.
> >
> I would really like to see an architecture thats more than just on flat
> static class with a bunch of double[] methods in it. this is not very
> useful to me.
>
> > If others feel that this additional infrastructure is essential, then I
> just
> > need to be educated. It is quite possible that I am thinking too narrowly
> in
> > terms of current scope and I may be missing some looming structural
> problems.
> > If this is the case, I am open to being educated. I just need to see a)
> exactly
> > why we need to add more complexity at this time and b) why breaking
> univariate
> > statistics into four packages and 17 classes when all we are computing is
> basic
> > statistics is necessary.
> >
>
> The packages are categorical, the classes are implementations of each
> statistic. The framework provides an intuitive and organized means for
> others to easily implement and add statistics to the packages without
> being restricted to a fascist and monolithic Univariate interface or
> static StatUtils interface.
>
> If anything the continued conflict between our two schools of thought
> shows the necessity of such an approach. Your school of thought can
> retain the monolithic Interfaces for "Univariate" and "StatUtil". While
> the framework can provide others with the ability to extend and expand
> the library without such "heavy handed" restrictions that cripple the
> extendability of the project.
>
> There was a great deal of discussion about the benefit of not having the
> methods implemented directly in static StatUtils because they could not
> be "overridden" or worked with in an Instantiable form. This approach
> frees the implementations up to be overridden and frees up room for
> alternate implementations.
>
> You may have your opinions of how you would like to see the packages
> organized and implemented. Others in the group do have alternate
> opinions to yours. I for one see a strong value in individually
> implemented Statistics. I also have a strong vision that the framework I
> have been working on provides substantial benefits.
>
> (1a.) It Allows both the storageless and storage based implementations
> to function behind the same interface. No matter if your calling
>
> increment(double d)
>
> or
>
> evaluate(double[]...)
>
> your working with the same algorithm.
>
> (1b.) If you wish to have alternate implementations for evaluate and
> increment, it is easily possible of overload theses methods in future
> versions of the implementations.
>
> (2.) With individual Implementations, alternate approaches can be coded
> and included for the benefit of those who have an interest in such
> implementations. Thus there could be multiple versions of Variance,
> based on the strategy of interest and the numerical accuracy required.
>
> (3.) Having the same implementations of statistics usable across all
> Univariate implementations assures a standard behavior and the same
> expected results no matter if your using incremental or evaluation based
> approaches.
>
> (4.) The frame work provides a formal structure for the future growth of
> the library. Knowing what a UnviariateStatistic is, and seeing the
> various implementations, its obvious the route one will take to
> implement future statistics of interest.
>
>
> Phil, its clear we have very different "schools of thought" on the
> subject of how the library should be designed. As a developer on the
> project I have a right to promote my design model and interests. The
> architecture is something I have a strong interest in working with.
>
> Apache projects are "group" projects, If a project such a [math] cannot
> find community and room for multiple directions of development. If it
> cannot make room for alternate ideas and visions, if both revolutionary
> and evolutionary processes cannot coexist, I doubt the project will have
> much of a future at all.
>
>
> -Mark
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]