Goal really is to get feedback from this group on how well that attempt
is working.
Is there a better API?  What is it?
What can be improved?
How clumsy is the current marrage of H2OMatrix vs Matrix?
What's the mental cost of H2O's "tall skinny data" vs Mahout's
All-The-Worlds-A-(squarish)-Matrix model?

I think Mahout's matrix was never intended to be an abstraction for running distributed computations. I don't understand why we would want to have an h20 backed matrix which offers methods that users should not call because they break the underlying partitioning scheme such as assignRow(), which has the comment "// Calling this likely indicates a huge performance bug." To me this indicates that the underlying design is broken.

This is exactly the reason why there is a separation between in-core matrices and distributed matrices in the DSL.

Right now we're working on cleaning up the H2O internal DSL to make it
better support either Spark/Scala and/or Dmitriy's DSL - plus also our
commitment to running R.  I'm hoping Mahout volunteers will peek at it

I'd be happy to see a concept of how to bring the operations of the DSL onto h20, as well as a detailed description of h20's programming and execution model.

--sebastian

Reply via email to