Sorry, last reply got sent before I was done with it. Pls disregard and try
this....
> > This adds
> > significant overhead and I do not see the value in it. The cost of the
> > additional stack operations/object creations is significant. I ran tests
> > comparing the previous version that does direct computations using the
> double[]
> > arrays to the modified version and found an average of more than 6x
> slowdown
> > using the new implementation. I did not profile memory utilization, but
> that is
> > also a concern. Repeated tests computing the mean of a 1000 doubles 100000
> > times using the old and new implementations averaged 1.5 and 10.2 seconds,
> > resp. I do not see the need for all of this additional overhead.
> >
>
> If you review the code, you'll find there is no added "object creation",
> the static Variable objects calculate on double[] just as the
> Univariates did, I would have to see more substantial analysis to
> believe your claim. All thats going on here are that the Static StatUtil
> methods are delegating to individual static instances of
> UnivariateStatistics. These are instantiated on JVM startup like all
> static objects, calling a method in such an object should not require
> any more overhead than having the method coded directly into the static
> method.
>
> If there are performance considerations, lets discuss these.
Here is what I added to StatUtils.test
double[] x = new double[1000];
for (int i = 0; i < 1000; i++) {
x[i] = (5 - i) * (i - 200);
}
long startTick = 0;
double res = 0;
for (int j = 0; j < 10; j++) {
startTick = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
res = OStatUtils.mean(x);
}
System.out.println("old: " + (System.currentTimeMillis() - startTick));
startTick = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
res = StatUtils.mean(x);
}
System.out.println("new: " + (System.currentTimeMillis() - startTick));
The result was a mean of 10203 for the "new" and 1531.1 for the "old", with
standard deviations 81.1 and 13.4 resp. The overhead is the stack operations
and temp object creations.
>
> I doubt (as the numerous discussions over the past week have pointed
> out) that what we really want to have in StatUtils is one monolithic
> Static class with all the implemented methods present in it. If I have
> misinterpreted this opinion in the group, then I'm sure there will be
> responses to this.
Well, I for one would prefer to have the simple computational methods in one
place. I would support making the class require instantiation, however, i.e.
making the methods non-static.
> There was a great deal of discussion about the benefit of not having the
> methods implemented directly in static StatUtils because they could not
> be "overridden" or worked with in an Instantiable form. This approach
> frees the implementations up to be overridden and frees up room for
> alternate implementations.
As I said above, the simplest way to deal with this is to make the methods
non-static.
>
> You may have your opinions of how you would like to see the packages
> organized and implemented. Others in the group do have alternate
> opinions to yours. I for one see a strong value in individually
> implemented Statistics. I also have a strong vision that the framework I
> have been working on provides substantial benefits.
>
> (1a.) It Allows both the storageless and storage based implementations
> to function behind the same interface. No matter if your calling
>
> increment(double d)
>
> or
>
> evaluate(double[]...)
>
> your working with the same algorithm.
That is true in the old implementation as well, with the core computational
methods in StatUtils.
>
> (1b.) If you wish to have alternate implementations for evaluate and
> increment, it is easily possible of overload theses methods in future
> versions of the implementations.
Just make the methods non-static and that will be possible. I am not sure,
given the relative triviality of these methods, if this is really a big deal,
howerver.
>
>
> Phil, its clear we have very different "schools of thought" on the
> subject of how the library should be designed. As a developer on the
> project I have a right to promote my design model and interests. The
> architecture is something I have a strong interest in working with.
You certainly have the right to your opinions. Others also have the right to
disagree with them.
>
> Apache projects are "group" projects, If a project such a [math] cannot
> find community and room for multiple directions of development. If it
> cannot make room for alternate ideas and visions, if both revolutionary
> and evolutionary processes cannot coexist, I doubt the project will have
> much of a future at all.
I agree with this as well; but from what I have observed, open source projects
do best when they do not try to go off in divergent directions at the same
time. If we cannot agree on a consistent architecture direction, then I don't
think we will succeed. If we can and we stay focussed, then we will. As I said
above, if others agree with the approach that you want to take, then that is
the direction that the project will go. I am interested in the opinions of
Tim, Robert and the rest of the team.
Phil
>
>
> -Mark
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]