Github user shivaram commented on the pull request:

    https://github.com/apache/incubator-spark/pull/575#issuecomment-34948280
  
    Sorry I missed this thread, but I'd like to understand a bit more about the 
scope of what we require in terms of library support before taking a decision.  
    
    1. I guess everybody agrees that we want our external interfaces to be 
simple / general and not impose any requirements of mahout-math / JBlas / 
Commons Math etc.  So the first question I guess is that we need to come up 
with an external representation for Sparse data (matrix & vector). Is the 
proposal that we use something like 
[CSR](https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_.28CSR_or_CRS.29)
 where we compress every row at a time ? 
     
    2. Dense Linear algebra: For dense matrices, the reason we chose JBlas was 
that it provided a low overhead interface for calling into BLAS-2/BLAS-3 
functions through JNI. This was in line with the kind of operations we were 
using for things like ALS and Regression etc. I'd suggest we stick to this as 
JBlas has a reasonable API and few external dependencies. Are there any 
features that JBlas is missing that we want to use ?
    
    3. Sparse Linear algebra: I'm still not sure what kind of operations we 
want for Sparse data. The most basic things I can think of are operations like 
dot products, indexing, traversing which shouldn't require calling into a 
native library. If the set of operations is small, I'd prefer an in-house 
implementation rather than depending on either a slow (Breeze) or to be 
deprecated (Commons Math) library.  Again I think this depends on the features 
we need, so it'll be good to sketch out one or two algorithms for Sparse data 
and pull in heavier libraries if we need them

Reply via email to