Phil Steitz wrote:
Tim O'Brien wrote:
What about this possibility. we could easily have DoubleArray return a reference to the internalStorageArray. I know this would violate encapsulation, but if we expose the interal array, the start and end index then there is no need to copy the contents of the array. Instead we pass a reference to an existing array - aka, no need to copy our element array.
+1 -- it *is* after all an array and if this is not exposed, you are always going to be stuck with using ArrayCopy to get at the underlying data, which makes efficient computation using large arrays impossible. I agonized over this same decision vis a vis RealMatrixImpl, where I ended up "breaking encapsulation" (similarly to other double[][]-based implementations) and exposing a getDataRef method that returns a reference to the underlying double[][] array.
I like it too, since I've been in looking at/messing with these classes I be glad to make the changes for us and add the static methods to the StatUtils. One note, I think we should retain a method that does copy the array as well as create one that exposes it, this is because the copy veriosn can provide us with an array copy that is trimmed down to the size of the actual content, because the internal store inceases "incrimentally" in the windowless case, there is the case that there are unitialized/unused sections at the end of the array (as well, in the windowed case, if the array isn't filled yet, there are unused sections). Providing an interface to retrieve a "cleaned" array is a useful option if one wants to retieve the data to manipulate it elsewhere. This would be usefull in both Fixed and Exp/Cont DoubleArrays.
Yes. I would certainly not recommend dropping the existing getElements() or replacing it with reference semantics. What I did in RealMatrix was to provide both getData and getDataRef, with the latter returning a reference. I would reserve getElements() for copy semantics and call the reference version something else.
Now, every method that takes a double[] in StatUtil, would be altered to take a (double[], int start, int length). So,
public static double sum(double[] values);
would delegate to a more "generic"
public static double sum(double[] values, int startIndex, int length);
I agree -- I think that Brent suggested this improvement already.
On the topic of StatUtils, what are the opinions about adding the following methods from my discussion with the lang group to provide alternate primitive implementations? These would be for short, long, int, float for now.
I don't see any harm in adding these; but I would not put a high priority on implementing them and I agree with Stephen that there is no harm in lang including the min/max functions directly in lang.math as well. Some duplication across packages is OK, IMHO. Also, I would not want lang -- or any other component -- to depend on anything in math until we have successfully emerged from the sandbox with a release. What may actually make more sense is for lang.math to add the min, max stuff and us to use their implementations of these in place of our own. But, once again, these are trivial functions and I see nothing wrong with implementing them in both places. Note that in any case, we will want to implement these with array offset arguments, which lang may not be interested in.
One more note on the min-max stuff: the implementation in StatUtils calls Math.min/max each time through the comparison loop. The loop should probably be rewritten to just keep track of the min/max and do a straight compare each time through (similar to what UnivariateImpl does) to avoid the unecessary function call within the loop.
primitive <-- min(primitive[]) primitive <-- max(primitive[]) primitive <-- sum(primitive[]) primitive <-- sumSq(primitive[])
in terms of other stat methods the theme would be more like:
double <-- mean(primitive[]) double <-- var(primitive[]) double <-- std(primitive[])
possibly similar methods for other stat methods, these all would involve casting the elements to double prior to calculating?
Yes, you would have to cast before computation, which sort of blows away the value of the array-based implementation. May be better to add addValue(primitive[]) to Univariate. I have been meaning to suggest addValue(double[]) for a while now.
Phil
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
