Unsafe et al

Gian Merlino Thu, 06 Feb 2020 15:21:33 -0800

> Another concern I have is that this current proposal appears quite
cumbersome.  Creating a Memory Segment is easy enough, but to read and
write into that segment appears to require declarations of every different
object type and their offsets using VarHandles.


The way I interpreted it is that if you just wanted to deal with primitive
types, you could get VarHandles for int, long, etc and use those with plain
offsets. So you wouldn't _need_ to declare everything in a super fancy way.
(But the MemoryLayout stuff is trying to make that possible, if it's what
you want.)

There's some example code at
http://cr.openjdk.java.net/~mcimadamore/8235837_javadoc/jdk/incubator/foreign/package-summary.html
.

It looks like it'd be pretty pleasant to use for the simple off-heap data
structures that Druid's query engines need, like hash tables, heaps,
arrays, etc.

But it will be a while before we can depend on it, so I think we need a
shorter term solution too.

On Thu, Feb 6, 2020 at 3:10 PM leerho <lee...@gmail.com> wrote:

> I have been studying JEP370 and indeed it looks promising.  However, it is
> only just completed JEP stage and has not been reviewed by the JCP for
> possible creation of a JSR.   Even assuming it flows smoothly through the
> JCP and JSR it is at least 2-3 years away if not longer.
>
> Another concern I have is that this current proposal appears quite
> cumbersome.  Creating a Memory Segment is easy enough, but to read and
> write into that segment appears to require declarations of every different
> object type and their offsets using VarHandles.   Suppose you need to read
> and write a C-like "struct" consisting of tightly packed ints, longs,
> bytes, shorts, doubles, etc.  Just the declarations of that structure would
> take pages of boiler-plate code.  Now, suppose you have a dynamic struct
> that changes over time and evolves (This is exactly the situation we have
> with sketches)!  Yikes! It is possible that the proposed "MemoryLayout" API
> may address this, but the current JEP is rather vague about how flexible it
> can be.
>
> I think that there is also a real risk that even if we wait for JEP370 to
> become reality, it still may not provide the flexibility that we need.  We
> just don't know yet what form it will actually take.
>
> If JEP370 does actually appear, and is workable, we will be strongly
> motivated to replace our current use of Unsafe inside Memory with the newer
> API, and all of that could be behind the current Memory API.
>
>
>
>
> On 2020/02/06 20:33:18, Gian Merlino <g...@apache.org> wrote:
> > By the way, I just did a quick test of using Memory instead of ByteBuffer
> > in VectorAggregator with groupBy (hash grouping, 8-byte key, two
> > aggregators) and measured a 14% speedup. I think the speedup would be
> > higher with more aggregators or with array-based rather than hash-based
> > grouping (since in those cases, relatively more time is spent doing
> > aggregations).
> >
> > On Thu, Feb 6, 2020 at 11:32 AM Gian Merlino <g...@apache.org> wrote:
> >
> > > We could make an interface that is a subset of Memory and use that.
> But I
> > > think once the new JEP 370 stuff becomes mainstream it would be best
> to use
> > > it directly, since it offers a richer API and will probably become
> > > considered idiomatic Java over time. So we may still want to drop the
> > > abstraction later, once we drop support for Java pre-14.
> > >
> > > Separately, I think if we do build an abstraction layer here, we need
> to
> > > make sure the performance overhead is zero — it's important that the
> jvm be
> > > able to inline the underlying calls.
> > >
> > > > @Gian Merlino I think i am not 100% sure about the scope you are
> talking
> > > about, thought you are talking about
> > > https://github.com/apache/druid/issues/3892 and doing the work from
> > > bottom up (storage to processing ...)
> > >
> > > For now I'm only proposing using a non-BB API (whether it's Memory or
> our
> > > own abstraction) for the VectorAggregator interface and the query
> engines
> > > that use it. I don't see a strong motivation to tackle the storage
> layer at
> > > this time. The reason is that when I profile Druid queries I usually
> see
> > > the lions share of time being spend in query engines and aggregators,
> so I
> > > think that's what we need to tackle first from a performance
> perspective.
> > >
> > > Btw, a side note, this talk is great:
> > > https://www.youtube.com/watch?v=iwSCtxMbBLI
> > >
> > > On Wed, Feb 5, 2020 at 6:44 PM Slim Bouguerra <
> slim.bougue...@gmail.com>
> > > wrote:
> > >
> > >> @Jad Sounds good approach was thinking the same as well how can we
> expose
> > >> those using higher level API.
> > >> @Gian Merlino <g...@apache.org> I think i am not 100% sure about the
> > >> scope you are talking about, thought you are talking about
> > >> https://github.com/apache/druid/issues/3892 and doing the work from
> > >> bottom up (storage to processing ...)
> > >> Can you please highlight the scope and would love to help if you think
> > >> adding such abstraction will add an extra work overhead that you do
> want to
> > >> avoid.
> > >>
> > >>
> > >> On Wed, Feb 5, 2020 at 5:52 PM Jad Naous <jad.na...@imply.io> wrote:
> > >>
> > >>> But then you could never get rid of it... Ideally abstraction layers
> > >>> should
> > >>> be constrained to the actual uses within the system using them
> instead of
> > >>> making something that may be too general and would be hard to
> replace.
> > >>>
> > >>> On Wed, Feb 5, 2020, 4:03 PM Gian Merlino <g...@apache.org> wrote:
> > >>>
> > >>> > I wonder if Memory itself could be that layer?
> > >>> >
> > >>> > On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <jad.na...@imply.io>
> wrote:
> > >>> >
> > >>> > > We could build an abstraction layer on top of the memory
> interface
> > >>> > provided
> > >>> > > by DataSketches. When the JDK gets the new stuff, we can just
> change
> > >>> the
> > >>> > > implementation of the abstraction.
> > >>> > >
> > >>> > > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <g...@apache.org>
> wrote:
> > >>> > >
> > >>> > > > The thing that worries me about JEP 370 is that if historical
> Java
> > >>> user
> > >>> > > > migration patterns hold up, we will need to support Java 11
> for a
> > >>> while
> > >>> > > > (probably another 2–3 years), and we would therefore need to
> wait
> > >>> that
> > >>> > > long
> > >>> > > > to use JEP 370. It seems like a long time and until then we
> would
> > >>> be
> > >>> > > stuck
> > >>> > > > with a pretty inferior API.
> > >>> > > >
> > >>> > > > I also would prefer not having to rewrite code a bunch of
> times,
> > >>> but
> > >>> > > that's
> > >>> > > > why I suggested starting by using Memory for the
> VectorAggregator
> > >>> > > interface
> > >>> > > > and stuff that interacts with it. There isn't that much code
> there
> > >>> yet
> > >>> > > > (only a few aggregators implement VectorAggregator). So we will
> > >>> need to
> > >>> > > > write most of it for the first time, and since it is fresh
> code, I
> > >>> > think
> > >>> > > > it'd be nice to use the best API currently available in Java 8
> /
> > >>> 11.
> > >>> > From
> > >>> > > > what I see that is Memory.
> > >>> > > >
> > >>> > > > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <
> bs...@apache.org>
> > >>> > wrote:
> > >>> > > >
> > >>> > > > > Hi Gian,
> > >>> > > > > Thanks for bringing this up.
> > >>> > > > > IMO for the long run and looking at how much code will have
> to
> > >>> > change,
> > >>> > > it
> > >>> > > > > makes more sense to rely on JDK based API JEP 370 and have
> this
> > >>> work
> > >>> > > done
> > >>> > > > > ONCE as oppose to multiple iteration. FYI i do not think it
> is
> > >>> far
> > >>> > > away,
> > >>> > > > > seems like there is a good momentum around it.
> > >>> > > > > This does not exclude or means we should not use Memory API
> for
> > >>> other
> > >>> > > > stuff
> > >>> > > > > like sketches et al, in fact i think even for project like
> > >>> Sketches
> > >>> > it
> > >>> > > > > makes more sense to move to newer API offered by the JDK
> rather
> > >>> that
> > >>> > do
> > >>> > > > it
> > >>> > > > > your self.
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <
> g...@apache.org>
> > >>> > wrote:
> > >>> > > > >
> > >>> > > > > > Hey Druids,
> > >>> > > > > >
> > >>> > > > > > There has generally been a lot of talk about moving away
> from
> > >>> > > > ByteBuffer
> > >>> > > > > > and towards the DataSketches Memory package (
> > >>> > > > > >
> https://datasketches.apache.org/docs/Memory/MemoryPackage.html)
> > >>> or
> > >>> > > > even
> > >>> > > > > > using Unsafe directly. Much of that discussion happened on
> > >>> > > > > > https://github.com/apache/druid/issues/3892.
> > >>> > > > > >
> > >>> > > > > > Recently a patch was merged that added datasketches-memory
> as a
> > >>> > > > > dependency
> > >>> > > > > > of druid-processing:
> https://github.com/apache/druid/pull/9308
> > >>> .
> > >>> > The
> > >>> > > > > reason
> > >>> > > > > > was partially due to better performance and partially due
> to
> > >>> nicer
> > >>> > > API
> > >>> > > > > > (both reasons mentioned in #3892 as well).
> > >>> > > > > >
> > >>> > > > > > JEP 370 is a potential long term solution but it seems a
> while
> > >>> away
> > >>> > > > from
> > >>> > > > > > being ready: https://openjdk.java.net/jeps/370
> > >>> > > > > >
> > >>> > > > > > I wanted to bring the larger discussion back up and see
> what
> > >>> people
> > >>> > > > think
> > >>> > > > > > is a good path forward.
> > >>> > > > > >
> > >>> > > > > > My suggestion is that we migrate the VectorAggregator
> > >>> interface to
> > >>> > > use
> > >>> > > > > > Memory, but keep BufferAggregator the way it is. That way,
> as
> > >>> we
> > >>> > > build
> > >>> > > > > out
> > >>> > > > > > support for vectorization (right now, only
> timeseries/groupby
> > >>> > support
> > >>> > > > it,
> > >>> > > > > > and only a few aggregators, but we should be building this
> out)
> > >>> > we'll
> > >>> > > > be
> > >>> > > > > > doing it with a nicer and potentially faster API. But we
> won't
> > >>> need
> > >>> > > to
> > >>> > > > go
> > >>> > > > > > back and redo a bunch of old code, since we'll keep the
> > >>> > > non-vectorized
> > >>> > > > > code
> > >>> > > > > > paths the way they are. (And hopefully, one day, delete
> them
> > >>> all
> > >>> > > > > outright.)
> > >>> > > > > >
> > >>> > > > > > Gian
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >> --
> > >>
> > >> B-Slim
> > >>
> _______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______
> > >>
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> For additional commands, e-mail: dev-h...@druid.apache.org
>
>

Re: ByteBuffer / Memory / Unsafe et al

Reply via email to