Re: Straw poll re: H2O ?

Sebastian Schelter Thu, 01 May 2014 10:09:46 -0700

Goal really is to get feedback from this group on how well that attempt
is working.
Is there a better API?  What is it?
What can be improved?
How clumsy is the current marrage of H2OMatrix vs Matrix?
What's the mental cost of H2O's "tall skinny data" vs Mahout's
All-The-Worlds-A-(squarish)-Matrix model?

I think Mahout's matrix was never intended to be an abstraction forrunning distributed computations. I don't understand why we would wantto have an h20 backed matrix which offers methods that users should notcall because they break the underlying partitioning scheme such asassignRow(), which has the comment "// Calling this likely indicates ahuge performance bug." To me this indicates that the underlying designis broken.

This is exactly the reason why there is a separation between in-corematrices and distributed matrices in the DSL.

Right now we're working on cleaning up the H2O internal DSL to make it
better support either Spark/Scala and/or Dmitriy's DSL - plus also our
commitment to running R.  I'm hoping Mahout volunteers will peek at it

I'd be happy to see a concept of how to bring the operations of the DSLonto h20, as well as a detailed description of h20's programming andexecution model.


--sebastian

Re: Straw poll re: H2O ?

Reply via email to