Github user shivaram commented on the pull request:
https://github.com/apache/incubator-spark/pull/575#issuecomment-34948280
Sorry I missed this thread, but I'd like to understand a bit more about the
scope of what we require in terms of library support before taking a decision.
1. I guess everybody agrees that we want our external interfaces to be
simple / general and not impose any requirements of mahout-math / JBlas /
Commons Math etc. So the first question I guess is that we need to come up
with an external representation for Sparse data (matrix & vector). Is the
proposal that we use something like
[CSR](https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_.28CSR_or_CRS.29)
where we compress every row at a time ?
2. Dense Linear algebra: For dense matrices, the reason we chose JBlas was
that it provided a low overhead interface for calling into BLAS-2/BLAS-3
functions through JNI. This was in line with the kind of operations we were
using for things like ALS and Regression etc. I'd suggest we stick to this as
JBlas has a reasonable API and few external dependencies. Are there any
features that JBlas is missing that we want to use ?
3. Sparse Linear algebra: I'm still not sure what kind of operations we
want for Sparse data. The most basic things I can think of are operations like
dot products, indexing, traversing which shouldn't require calling into a
native library. If the set of operations is small, I'd prefer an in-house
implementation rather than depending on either a slow (Breeze) or to be
deprecated (Commons Math) library. Again I think this depends on the features
we need, so it'll be good to sketch out one or two algorithms for Sparse data
and pull in heavier libraries if we need them