This is interesting. And this happens in one node. Can it be decoupled from
parallelization concerns and re-used? (proposal D)


On Wed, Apr 30, 2014 at 4:09 PM, Ted Dunning <[email protected]> wrote:

> I should add that the way that the compression is done is pretty cool for
> speed.  The basic idea is that byte code engineering is used to directly
> inject the decompression and compression code into the user code.  This
> allows format conditionals to be hoisted outside the parallel loop
> entirely.  This drops decompression overhead to just a few cycles.  This is
> necessary because the point is to allow the inner loop to proceed at L1
> speeds instead of L3 speeds (really L3 / compression ratio).
>
>
>
> On Thu, May 1, 2014 at 12:35 AM, Dmitriy Lyubimov <[email protected]>
> wrote:
>
> > On Wed, Apr 30, 2014 at 3:24 PM, Ted Dunning <[email protected]>
> > wrote:
> >
> > > Inline
> > >
> > >
> > > On Wed, Apr 30, 2014 at 8:25 PM, Dmitriy Lyubimov <[email protected]>
> > > wrote:
> > >
> > > > On Wed, Apr 30, 2014 at 7:06 AM, Ted Dunning <[email protected]>
> > > > wrote:
> > > >
> > > > >
> > > > > My motivation to accept comes from the fact that they have machine
> > > > learning
> > > > > codes that are as fast as what google has internally.  They
> > completely
> > > > > crush all of the spark efforts on speed.
> > > > >
> > > >
> > > > correct me if i am wrong. h20 performance strengths come from speed
> of
> > > > in-core computations and efficient compression (that's what i heard
> at
> > > > least).
> > > >
> > >
> > > Those two factors are key.  In addition, the ability to dispatch
> parallel
> > > computations with microsecond latencies is also important as well as
> the
> > > ability to transparently communicate at high speeds between processes
> > both
> > > local and remote.
> > >
> >
> > This is kind of an old news.  They all do, for years now. I've been
> > building a system that does real time distributed pipelines (~30 ms to
> > start all steps in pipeline + in-core complexity)  for years.  Note that
> > node-to-node hop in clouds are usually mean at about 10ms so microseconds
> > are kind of out of question for network performance reasons in real life
> > except for private racks.
> >
> > The only thing that doesn't do this is the MR variety of Hadoop.
> >
>

Reply via email to