I couldn't say.

Let's invite the 0xdata to show us what can happen.




On Thu, May 1, 2014 at 1:39 AM, Dmitriy Lyubimov <[email protected]> wrote:

> This is interesting. And this happens in one node. Can it be decoupled from
> parallelization concerns and re-used? (proposal D)
>
>
> On Wed, Apr 30, 2014 at 4:09 PM, Ted Dunning <[email protected]>
> wrote:
>
> > I should add that the way that the compression is done is pretty cool for
> > speed.  The basic idea is that byte code engineering is used to directly
> > inject the decompression and compression code into the user code.  This
> > allows format conditionals to be hoisted outside the parallel loop
> > entirely.  This drops decompression overhead to just a few cycles.  This
> is
> > necessary because the point is to allow the inner loop to proceed at L1
> > speeds instead of L3 speeds (really L3 / compression ratio).
> >
> >
> >
> > On Thu, May 1, 2014 at 12:35 AM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> >
> > > On Wed, Apr 30, 2014 at 3:24 PM, Ted Dunning <[email protected]>
> > > wrote:
> > >
> > > > Inline
> > > >
> > > >
> > > > On Wed, Apr 30, 2014 at 8:25 PM, Dmitriy Lyubimov <[email protected]
> >
> > > > wrote:
> > > >
> > > > > On Wed, Apr 30, 2014 at 7:06 AM, Ted Dunning <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > >
> > > > > > My motivation to accept comes from the fact that they have
> machine
> > > > > learning
> > > > > > codes that are as fast as what google has internally.  They
> > > completely
> > > > > > crush all of the spark efforts on speed.
> > > > > >
> > > > >
> > > > > correct me if i am wrong. h20 performance strengths come from speed
> > of
> > > > > in-core computations and efficient compression (that's what i heard
> > at
> > > > > least).
> > > > >
> > > >
> > > > Those two factors are key.  In addition, the ability to dispatch
> > parallel
> > > > computations with microsecond latencies is also important as well as
> > the
> > > > ability to transparently communicate at high speeds between processes
> > > both
> > > > local and remote.
> > > >
> > >
> > > This is kind of an old news.  They all do, for years now. I've been
> > > building a system that does real time distributed pipelines (~30 ms to
> > > start all steps in pipeline + in-core complexity)  for years.  Note
> that
> > > node-to-node hop in clouds are usually mean at about 10ms so
> microseconds
> > > are kind of out of question for network performance reasons in real
> life
> > > except for private racks.
> > >
> > > The only thing that doesn't do this is the MR variety of Hadoop.
> > >
> >
>

Reply via email to