[
https://issues.apache.org/jira/browse/SPARK-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629182#comment-14629182
]
Xiangrui Meng commented on SPARK-6442:
--------------------------------------
If there existed some linear algebra library in Java like numpy/scipy in
Python, there would be absolutely no need to create a new one. There are couple
factors we care:
1. license
2. sparse support
3. performance
4. Java compatibility
We couldn't find one that meet all 4 requirements. For commons-math, I think
the problems are 2 (they are deprecating the sparse library) and 3. For breeze,
the problems are 4 and some 3. For MTJ, the problem is 1. For
JBLAS/netlib-java, the problems are 2 and some concerns about 1. Those were
considered in the PR that introduced sparse support a year ago. Unfortunately,
Apache deleted the incubator-spark repo. But you can find the discussion here:
http://apache-spark-developers-list.1001551.n3.nabble.com/GitHub-incubator-spark-pull-request-Proposal-Adding-sparse-data-suppor-tc954.html#none
Initially, we only want to make a thin wrapper over breeze, but we decided to
not expose breeze types in the public APIs, which is a general guideline across
Spark components. But because of this, we received many complaints from users
about lacking of linear algebra support. The code `toBreeze` and `fromBreeze`
also make the implementation messy. Initially we only use limited operations
from breeze, which we compared the performance (github.com/mengxr/linalg-test).
Later on, we started using more breeze operations and hit performance issues.
So we implement some BLAS routines for dense and sparse data and some operators
that we need to get good performance without worrying about some Scala magic.
To sum up, the demand for a linear algebra library comes from both external
users and internal developers. The goal of this JIRA is an implementation that
meets all 4 requirements. The work hasn't really started since I'm not very
confident that we can meet all 4 requirements easily.
> MLlib Local Linear Algebra Package
> ----------------------------------
>
> Key: SPARK-6442
> URL: https://issues.apache.org/jira/browse/SPARK-6442
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Burak Yavuz
> Priority: Critical
>
> MLlib's local linear algebra package doesn't have any support for any type of
> matrix operations. With 1.5, we wish to add support to a complete package of
> optimized linear algebra operations for Scala/Java users.
> The main goal is to support lazy operations so that element-wise can be
> implemented in a single for-loop, and complex operations can be interfaced
> through BLAS.
> The design doc: http://goo.gl/sf5LCE
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]