On Wed, Apr 30, 2014 at 3:24 PM, Ted Dunning <[email protected]> wrote:
> Inline > > > On Wed, Apr 30, 2014 at 8:25 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > > On Wed, Apr 30, 2014 at 7:06 AM, Ted Dunning <[email protected]> > > wrote: > > > > > > > > My motivation to accept comes from the fact that they have machine > > learning > > > codes that are as fast as what google has internally. They completely > > > crush all of the spark efforts on speed. > > > > > > > correct me if i am wrong. h20 performance strengths come from speed of > > in-core computations and efficient compression (that's what i heard at > > least). > > > > Those two factors are key. In addition, the ability to dispatch parallel > computations with microsecond latencies is also important as well as the > ability to transparently communicate at high speeds between processes both > local and remote. > This is kind of an old news. They all do, for years now. I've been building a system that does real time distributed pipelines (~30 ms to start all steps in pipeline + in-core complexity) for years. Note that node-to-node hop in clouds are usually mean at about 10ms so microseconds are kind of out of question for network performance reasons in real life except for private racks. The only thing that doesn't do this is the MR variety of Hadoop.
