[ 
https://issues.apache.org/jira/browse/STATISTICS-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805157#comment-16805157
 ] 

Eric Barnhill commented on STATISTICS-7:
----------------------------------------

[~BenN] [~Salman]

Interest has flowed organically in the direction of regression, and that's 
great. [~erans] is right that this unavoidably brings up the linear library, 
what to do with it, and how. We are a do-ocracy here at commons-numbers and the 
proposed solution does not need to be ideal, the priority is to get the 
component up and running.

I have only one suggestion for what *not* to do and this is what was done 
before. This is to implement basic linear operations under many layers of 
object-oriented abstraction with the goal of assembling some sort of omnibus OO 
math library. The focus in commons is lightweight reusable components that are 
widely used and easy to use in real life Java programming. The user should 
*not* have to digest a large mathematically focused API to get Pearson's r from 
two vectors or solve Ax=b. The production developers in my shop should feel as 
comfortable grabbing commons-statistics-regression for a task as they do 
commons-csv . 

If you want to just accept array or List input and adapt the current linear 
functionality to process those inputs, I think that is a good solution. If you 
would rather create some sort of re-usable Matrix component and stick that into 
commons stats, to make your code more readable, that's fine too. We could also 
start up a small commons-numbers-linear project if someone was excited to do 
that, but that is definitely not necessary.

Hopefully we can find a way all interested mentees can take roles that 
complement each other in ways that interest them. Once we have a sense for 
that, and the proposals are approved I will write up and assign the necessary 
tickets. And of course we hope that after the summer you will continue working 
with us.

> Stream-based Java statistical processing
> ----------------------------------------
>
>                 Key: STATISTICS-7
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-7
>             Project: Apache Commons Statistics
>          Issue Type: New Feature
>            Reporter: Eric Barnhill
>            Priority: Major
>              Labels: GSoC2019, gsoc2019, statistics, streams
>
> The new component aims to be a library of commons statistics functions 
> synchronized with the latest developments in the Java language, in particular 
> Java's functional programming syntax.
> The library will make commonly used statistical functions available to an end 
> user through a simple grammar comparable to commons-math-statistics or 
> scikit-learn, while under the hood will implement Java's mapping, streaming, 
> and other producer and consumer functions to ensure the statistical methods 
> run optimally in new Java implementations.
> Developers working on the project will have the opportunity to demonstrate 
> Java programming, functional programming, algorithm design, and data science 
> skills and receive authorship on a commons project that is likely to be 
> widely used.
> The ideal contributor will also be able to help with important architectural 
> decision making. The old source of these libraries, commons-math, grew too 
> large, hierarchically complex and interdependent for the commons mission. The 
> developers on this project need to make architectural choices that will 
> enable the statiscal code to be lightweight and reusable, with a minimum of 
> outside dependencies while avoiding redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to