Inline
On Wed, Apr 30, 2014 at 8:25 PM, Dmitriy Lyubimov <[email protected]> wrote: > On Wed, Apr 30, 2014 at 7:06 AM, Ted Dunning <[email protected]> > wrote: > > > > > My motivation to accept comes from the fact that they have machine > learning > > codes that are as fast as what google has internally. They completely > > crush all of the spark efforts on speed. > > > > correct me if i am wrong. h20 performance strengths come from speed of > in-core computations and efficient compression (that's what i heard at > least). > Those two factors are key. In addition, the ability to dispatch parallel computations with microsecond latencies is also important as well as the ability to transparently communicate at high speeds between processes both local and remote. > in DSL effort these are managed by Mahout-math (in-core vector and matrix > implementations and speed of their serialization, respectively), regardless > of the distributed model. > > I am not aware of any benchmark done of say in-core sparse matrix by > in-core sparse matrix multiplication between Mahout-math and h2o (probably > because h2o doesn't have in-core matrices as it stands), but assuming there > were, the above statement should be corrected in a sense that h2o beats the > dust out of Mahout-math in speed of computation and serialization. > h2o does have parallel in-core matrix operations. They are really fast. That isn't the same as having a real benchmark against Mahout ops. That statement , as it stands today, is found to be easily agreeable with > by me. > > However, the point is to provide proper decoupling of programming model, > distributed block management, and in-core computation/serialization > concerns. If I have a proper abstraction and decoupling for in-core > operations and serialization, I am free to plug in any in-core math and > serialization of thereof, including h2o. Therefore, this becomes secondary > issue as opposed to general architecture. > This is goal. Reality is bound to be different. But that is what attempting the plugging in of different modules is all about.
