[
https://issues.apache.org/jira/browse/STATISTICS-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816436#comment-16816436
]
Ben Nguyen commented on STATISTICS-8:
-------------------------------------
Hello,
My exams will be done by April 27 and I will get started right away on learning
and planning the final (build ready) design of the regression library. Hoping
that my GSoC proposal is accepted so I won't have to get a full-time job for
the summer to pay for school next year. :P
> Implementation of regression libraries within common-statistics framework
> -------------------------------------------------------------------------
>
> Key: STATISTICS-8
> URL: https://issues.apache.org/jira/browse/STATISTICS-8
> Project: Apache Commons Statistics
> Issue Type: Task
> Reporter: Eric Barnhill
> Priority: Major
>
> Apache commons is one of the most widely used resources by Java programmers
> around the world. Data related applications are soaring and Java is one of
> the most commonly used languages for data engineering. Consequently the
> commons-statistics library, currently under development, is likely to find a
> widespread audience.
> For this project we aim to implement regression methods, arguably the most
> widely used techniques in statistics and machine learning, within the Apache
> commons framework, in particular within the new commons-statistics library.
> The assignee will:
> * Use core functionality from the regression sub-libraries of the deprecated
> commons-math 4 framework as a starting point
> * Create a new, standalone commons component for regression statistics,
> focusing first on linear and logistic regression
> * Make architectural and design decisions in the commons philosophy, that
> is, lightweight standalone components easy to understand and use by a wide
> range of Java developers (i.e. not a large, omnibus mathematical library with
> many degrees of abstraction)
> * Draw inspiration from widely used libraries in scikit-learn and R to
> design an up-to-date statistics package
> * Design unit testing and documentation for these libraries
> Particularly challenging design decisions include how to incorporate core
> matrix libraries with a minimum of dependencies and redundancies.
> We see this project as potentially having a large impact on big data
> applications. Java and the JVM are fundamental to popular data engineering
> tools like Hadoop and Spark. Regression analyses are however often handled
> downstream, on the other side of the "data fence", by tools like Python and
> R. A robust and scalable pure Java regression library, easily visible and
> accessible through Apache commons, can enable better integration of both
> sides of this data divide by enabling many machine learning steps to be
> programmed at scale on the Java side.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)