Phil Steitz wrote:
The following changes have been suggested recently. Before cutting 1.0 final, we should make sure we are all OK postponing or forgoing these:

1) Eliminate the univariate/multivariate distinction in the stat package, because this seems confusing to some. Change .univariate to .descriptive and .multivariate to .regression

Univariate and Multivariate are just "classifications". There is no suggestion of changing the structure of the packages. Perhaps we can begin building a "classification outline" now so that we have a better idea what are the classes of statistics and what we want our naming scheme to be based on. In the past I've always leaned towards a classification similar to the mathworld site.


The idea of moving SimpleRegression to a package called "regression" is a means to classify "regressions" as much as to classify "multivariates" or "univariates".

o.a.c.math.stat.regression.SimpleRegression
o.a.c.math.stat.univariate.DescripiveStatistics
o.a.c.math.stat.multivariate...

Kim made a critique about the naming. Yet package names have little to do with the performance of the library. A simple package rename for clarification prior to release is ok with me as long as it "is clarifying".

[undecided]

2) Add methods to create row or column matrices from double arrays and to extract submatrices (to the interface itself, rather than adding these to a utils class later)


Yes, abstracting the passing the reference to a row, column or submatrix to an interface provides us a means to generically perform operations on the matrix independent of the primitive double[] type which cannot be customized or extended. By passing the interface and not the array itself we can actually hand around "references" to the original matrix instead of copies of it. This will be much more efficient for large matrices and allow us as well to implement the same methods on sparse matrix implementations which may not actually be stored in an [][] structure.


[+1]

3) Make the PRNG fully pluggable in the random package.

I think the challenge we end up with here is to simply provide an interface and base implementation that uses the JVM PRNG, if a user wishes to override the PRNG they simple just implement the interface and pass the implementation into the class that uses the PRNG. We can also provide a separate driver implementation based on RngPack and package that separately as well. If users wish to change the PRNG then they can pickup the RngPack distro and our driver for it.


[+1]

4) Modify Variance and StandardDeviation to compute multiple statistics (with the variants being population, rather than sample statistics).

Yes the choice is to decide if these are infact "variants" of the same statistic or infact separate statistics. I'm not convinced either way at this point and can see both approaches not deviating from package design.


[Undecided]

5) Drop the interface / implementation separation throughout the package.

This sounded more like a complaint about Java itself. The logic behind this recommendation was unclear to me and totally destroys any extensibility to the API. Interfaces and Implementation as standard to Java and necessary for any package to work properly. I might suggest the argument was more about "Factories" vs using actual constructors to build the objects, which I would see as a more serious argument concerning the Packages design.


[-1]


I am personally -1 on 4) and 5); -0 on 1) and 2); and +0 on 3). I voted +1 on the release; however, which means that 3) is a wart that I am willing to live with for 1.0. It can be worked around now and to fix it correctly will require that we define a PRNG interface and introduce factories, etc.


Mark, since you voted to reopen API discussion, can you weigh in on these issues and add any others that you see as show-stoppers?


I felt I could live with these issues unresolved for release 1.0 as well. Yet it sounded like others did not find it satisfactory. I'm willing to work on those I voted [+1] on (Matrix Methods, and PRNG Plugability) to get the packages more satisfactory. I think we should just implement the Variants of Variance and StandardDeviation as separate classes and continue any argument concerning what the appropriate strategy is for them in the future. I would be interested in assisting in this as well.


-Mark

--
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to