On 23/10/2019 00:13, Gilles Sadowski wrote:
Hello.

Le mar. 22 oct. 2019 à 21:50, Eric Barnhill <ericbarnh...@gmail.com> a écrit :
I propose the following class structure for commons-statistics-regression.
Which?
[Attachment was probably stripped: such should go to a JIRA report.]

Quick first thoughts on the method names:

LinearRegression::RSquared

LogisticRegression::predictionProbs


Are these computing methods or property getters? I assume that all the computation is done in the methods:

Regression::fit

Regression::predict(double[])


Thus the methods in the implementation classes access additional results specific to the the method. So should be:

LinearRegression::getRSquared

LogisticRegression::getPredictionProbabilities(double[])


The interface carried over from commons-math is more of an academic approach to thinking 
about regression. For rebooting the library (and I hinted at this when I wrote the 
tickets for summer of code) I was hoping to emulate widespread tools like R and 
scikit-learn, and consider that "machine learning" is an increasingly popular 
use of regression. This proposed structure creates an interface that is not the same as, 
but will be very friendly to, anyone coming from R or scikit-learn, or similar tools in 
JavaScript.

There are of course many ways I can see to elaborate this scheme, say using 
RegressionResult objects and so forth. But Matrices paired with a double[], 
returning a double[] of coefficients or predictions, are likely to be the most 
common use cases and should be plenty to get started.
Commenting perhaps too early (not seeing the proposed design), but we broadly
discussed that the linear algebra API is not easy to get right, and once we "get
started", the trend is to be stuck with it for ages (related issues
are among the
oldest unresolved ones in CM).

Under the hood I would use the available implementations in commons-math to get 
up and running, and worry about improving them later.
Do you mean port from, or depend on, CM?

I assume that the Matrix object in the API is a new interface for commons-statistics. Thus allowing the underlying implementation to be pluggable. The initial version could included a shaded library to use whatever is appropriate.

Alex



Regards,
Gilles

Feedback appreciated,
Eric
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to