On Thu, May 1, 2014 at 10:08 AM, Sebastian Schelter <[email protected]> wrote:

> Goal really is to get feedback from this group on how well that attempt
>> is working.
>> Is there a better API?  What is it?
>> What can be improved?
>> How clumsy is the current marrage of H2OMatrix vs Matrix?
>> What's the mental cost of H2O's "tall skinny data" vs Mahout's
>> All-The-Worlds-A-(squarish)-Matrix model?
>>
>
> I think Mahout's matrix was never intended to be an abstraction for
> running distributed computations. I don't understand why we would want to
> have an h20 backed matrix which offers methods that users should not call
> because they break the underlying partitioning scheme such as assignRow(),
> which has the comment "// Calling this likely indicates a huge performance
> bug." To me this indicates that the underlying design is broken.
>
> This is exactly the reason why there is a separation between in-core
> matrices and distributed matrices in the DSL.

+1. This captures the very essense of my sole objection in M-1500 exactly.

>
>
>  Right now we're working on cleaning up the H2O internal DSL to make it
>> better support either Spark/Scala and/or Dmitriy's DSL - plus also our
>> commitment to running R.  I'm hoping Mahout volunteers will peek at it
>>
>
> I'd be happy to see a concept of how to bring the operations of the DSL
> onto h20, as well as a detailed description of h20's programming and
> execution model.

+1.

>
>
> --sebastian
>

Reply via email to