This is interesting. And this happens in one node. Can it be decoupled from parallelization concerns and re-used? (proposal D)
On Wed, Apr 30, 2014 at 4:09 PM, Ted Dunning <[email protected]> wrote: > I should add that the way that the compression is done is pretty cool for > speed. The basic idea is that byte code engineering is used to directly > inject the decompression and compression code into the user code. This > allows format conditionals to be hoisted outside the parallel loop > entirely. This drops decompression overhead to just a few cycles. This is > necessary because the point is to allow the inner loop to proceed at L1 > speeds instead of L3 speeds (really L3 / compression ratio). > > > > On Thu, May 1, 2014 at 12:35 AM, Dmitriy Lyubimov <[email protected]> > wrote: > > > On Wed, Apr 30, 2014 at 3:24 PM, Ted Dunning <[email protected]> > > wrote: > > > > > Inline > > > > > > > > > On Wed, Apr 30, 2014 at 8:25 PM, Dmitriy Lyubimov <[email protected]> > > > wrote: > > > > > > > On Wed, Apr 30, 2014 at 7:06 AM, Ted Dunning <[email protected]> > > > > wrote: > > > > > > > > > > > > > > My motivation to accept comes from the fact that they have machine > > > > learning > > > > > codes that are as fast as what google has internally. They > > completely > > > > > crush all of the spark efforts on speed. > > > > > > > > > > > > > correct me if i am wrong. h20 performance strengths come from speed > of > > > > in-core computations and efficient compression (that's what i heard > at > > > > least). > > > > > > > > > > Those two factors are key. In addition, the ability to dispatch > parallel > > > computations with microsecond latencies is also important as well as > the > > > ability to transparently communicate at high speeds between processes > > both > > > local and remote. > > > > > > > This is kind of an old news. They all do, for years now. I've been > > building a system that does real time distributed pipelines (~30 ms to > > start all steps in pipeline + in-core complexity) for years. Note that > > node-to-node hop in clouds are usually mean at about 10ms so microseconds > > are kind of out of question for network performance reasons in real life > > except for private racks. > > > > The only thing that doesn't do this is the MR variety of Hadoop. > > >
