[ 
https://issues.apache.org/jira/browse/SPARK-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629182#comment-14629182
 ] 

Xiangrui Meng commented on SPARK-6442:
--------------------------------------

If there existed some linear algebra library in Java like numpy/scipy in 
Python, there would be absolutely no need to create a new one. There are couple 
factors we care:

1. license
2. sparse support
3. performance
4. Java compatibility

We couldn't find one that meet all 4 requirements. For commons-math, I think 
the problems are 2 (they are deprecating the sparse library) and 3. For breeze, 
the problems are 4 and some 3. For MTJ, the problem is 1. For 
JBLAS/netlib-java, the problems are 2 and some concerns about 1. Those were 
considered in the PR that introduced sparse support a year ago. Unfortunately, 
Apache deleted the incubator-spark repo. But you can find the discussion here: 
http://apache-spark-developers-list.1001551.n3.nabble.com/GitHub-incubator-spark-pull-request-Proposal-Adding-sparse-data-suppor-tc954.html#none

Initially, we only want to make a thin wrapper over breeze, but we decided to 
not expose breeze types in the public APIs, which is a general guideline across 
Spark components. But because of this, we received many complaints from users 
about lacking of linear algebra support. The code `toBreeze` and `fromBreeze` 
also make the implementation messy. Initially we only use limited operations 
from breeze, which we compared the performance (github.com/mengxr/linalg-test). 
Later on, we started using more breeze operations and hit performance issues. 
So we implement some BLAS routines for dense and sparse data and some operators 
that we need to get good performance without worrying about some Scala magic.

To sum up, the demand for a linear algebra library comes from both external 
users and internal developers. The goal of this JIRA is an implementation that 
meets all 4 requirements. The work hasn't really started since I'm not very 
confident that we can meet all 4 requirements easily.



> MLlib Local Linear Algebra Package
> ----------------------------------
>
>                 Key: SPARK-6442
>                 URL: https://issues.apache.org/jira/browse/SPARK-6442
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Burak Yavuz
>            Priority: Critical
>
> MLlib's local linear algebra package doesn't have any support for any type of 
> matrix operations. With 1.5, we wish to add support to a complete package of 
> optimized linear algebra operations for Scala/Java users.
> The main goal is to support lazy operations so that element-wise can be 
> implemented in a single for-loop, and complex operations can be interfaced 
> through BLAS. 
> The design doc: http://goo.gl/sf5LCE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to