On Thu, May 1, 2014 at 10:08 AM, Sebastian Schelter <[email protected]> wrote:
> Goal really is to get feedback from this group on how well that attempt >> is working. >> Is there a better API? What is it? >> What can be improved? >> How clumsy is the current marrage of H2OMatrix vs Matrix? >> What's the mental cost of H2O's "tall skinny data" vs Mahout's >> All-The-Worlds-A-(squarish)-Matrix model? >> > > I think Mahout's matrix was never intended to be an abstraction for > running distributed computations. I don't understand why we would want to > have an h20 backed matrix which offers methods that users should not call > because they break the underlying partitioning scheme such as assignRow(), > which has the comment "// Calling this likely indicates a huge performance > bug." To me this indicates that the underlying design is broken. > > This is exactly the reason why there is a separation between in-core > matrices and distributed matrices in the DSL. +1. This captures the very essense of my sole objection in M-1500 exactly. > > > Right now we're working on cleaning up the H2O internal DSL to make it >> better support either Spark/Scala and/or Dmitriy's DSL - plus also our >> commitment to running R. I'm hoping Mahout volunteers will peek at it >> > > I'd be happy to see a concept of how to bring the operations of the DSL > onto h20, as well as a detailed description of h20's programming and > execution model. +1. > > > --sebastian >
