Github user srowen commented on the pull request:

    https://github.com/apache/incubator-spark/pull/575#issuecomment-34954740
  
    My $0.02 to the discussion:
    
    1. Within whatever operations mllib provides, serialization can be 
considered an implementation detail. But external serialization will come up, 
and I favor supporting something terribly simple. Text-based "row,col,val" 
format strikes me as most standard (which is not quite CSR but almost) since 
this can be parsed by, say, R or Octave.
    
    2. Agree. Its primary purpose is a hook into BLAS from Java, but its API is 
"good enough" for purposes here I think in that it supports all the primitive 
ops I think one would want, and the more complex standard ones like solving a 
system.
    
    3. I think one should assume sparse is incompatible with native code, yes. 
I think the set of operations that are needed is pretty straightforward and 
provided by anything one picks off the shelf. 
    
    On the one hand, it seems crazy to write yet another in-house 
implementation. But I think it's a viable and rational alternative. The 
argument for is that the set of operations is quite simple, and really it would 
be nice to have an API exactly in-line with JBLAS as much as possible.
    
    A quick way to achieve this is to repurpose the Commons Math class and chop 
it up. At least, no need to write from scratch and rewrite bugs.
    
    There's an idea in this thread to make a façade to insulate everything 
from this choice. This also amounts to writing half of a matrix library, since 
you will end up with a lot of engineering to maintain abstractions and 
performance.
    
    Here are my personal current top favorite ideas:
    
    1. Use Commons Math everywhere and slip in JBLAS where needed. Consistent 
API, no rewriting, and still get the speed where needed
    2. Repurpose Commons Math sparse implementation to create a new sparse 
counterpart to JBLAS API. Consistent API, a bit of rewriting needed.
    3. The façade idea, implemented on top of Commons Math sparse and JBLAS 
for now.
    
    ... and then long-term I would love to see that this question gets solved 
really well by the likes of Breeze or something and then this project uses that.

Reply via email to