@Jad Sounds good approach was thinking the same as well how can we expose
those using higher level API.
@Gian Merlino <g...@apache.org> I think i am not 100% sure about the scope
you are talking about, thought you are talking about
https://github.com/apache/druid/issues/3892 and doing the work from bottom
up (storage to processing ...)
Can you please highlight the scope and would love to help if you think
adding such abstraction will add an extra work overhead that you do want to
avoid.


On Wed, Feb 5, 2020 at 5:52 PM Jad Naous <jad.na...@imply.io> wrote:

> But then you could never get rid of it... Ideally abstraction layers should
> be constrained to the actual uses within the system using them instead of
> making something that may be too general and would be hard to replace.
>
> On Wed, Feb 5, 2020, 4:03 PM Gian Merlino <g...@apache.org> wrote:
>
> > I wonder if Memory itself could be that layer?
> >
> > On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <jad.na...@imply.io> wrote:
> >
> > > We could build an abstraction layer on top of the memory interface
> > provided
> > > by DataSketches. When the JDK gets the new stuff, we can just change
> the
> > > implementation of the abstraction.
> > >
> > > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <g...@apache.org> wrote:
> > >
> > > > The thing that worries me about JEP 370 is that if historical Java
> user
> > > > migration patterns hold up, we will need to support Java 11 for a
> while
> > > > (probably another 2–3 years), and we would therefore need to wait
> that
> > > long
> > > > to use JEP 370. It seems like a long time and until then we would be
> > > stuck
> > > > with a pretty inferior API.
> > > >
> > > > I also would prefer not having to rewrite code a bunch of times, but
> > > that's
> > > > why I suggested starting by using Memory for the VectorAggregator
> > > interface
> > > > and stuff that interacts with it. There isn't that much code there
> yet
> > > > (only a few aggregators implement VectorAggregator). So we will need
> to
> > > > write most of it for the first time, and since it is fresh code, I
> > think
> > > > it'd be nice to use the best API currently available in Java 8 / 11.
> > From
> > > > what I see that is Memory.
> > > >
> > > > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <bs...@apache.org>
> > wrote:
> > > >
> > > > > Hi Gian,
> > > > > Thanks for bringing this up.
> > > > > IMO for the long run and looking at how much code will have to
> > change,
> > > it
> > > > > makes more sense to rely on JDK based API JEP 370 and have this
> work
> > > done
> > > > > ONCE as oppose to multiple iteration. FYI i do not think it is far
> > > away,
> > > > > seems like there is a good momentum around it.
> > > > > This does not exclude or means we should not use Memory API for
> other
> > > > stuff
> > > > > like sketches et al, in fact i think even for project like Sketches
> > it
> > > > > makes more sense to move to newer API offered by the JDK rather
> that
> > do
> > > > it
> > > > > your self.
> > > > >
> > > > >
> > > > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <g...@apache.org>
> > wrote:
> > > > >
> > > > > > Hey Druids,
> > > > > >
> > > > > > There has generally been a lot of talk about moving away from
> > > > ByteBuffer
> > > > > > and towards the DataSketches Memory package (
> > > > > > https://datasketches.apache.org/docs/Memory/MemoryPackage.html)
> or
> > > > even
> > > > > > using Unsafe directly. Much of that discussion happened on
> > > > > > https://github.com/apache/druid/issues/3892.
> > > > > >
> > > > > > Recently a patch was merged that added datasketches-memory as a
> > > > > dependency
> > > > > > of druid-processing: https://github.com/apache/druid/pull/9308.
> > The
> > > > > reason
> > > > > > was partially due to better performance and partially due to
> nicer
> > > API
> > > > > > (both reasons mentioned in #3892 as well).
> > > > > >
> > > > > > JEP 370 is a potential long term solution but it seems a while
> away
> > > > from
> > > > > > being ready: https://openjdk.java.net/jeps/370
> > > > > >
> > > > > > I wanted to bring the larger discussion back up and see what
> people
> > > > think
> > > > > > is a good path forward.
> > > > > >
> > > > > > My suggestion is that we migrate the VectorAggregator interface
> to
> > > use
> > > > > > Memory, but keep BufferAggregator the way it is. That way, as we
> > > build
> > > > > out
> > > > > > support for vectorization (right now, only timeseries/groupby
> > support
> > > > it,
> > > > > > and only a few aggregators, but we should be building this out)
> > we'll
> > > > be
> > > > > > doing it with a nicer and potentially faster API. But we won't
> need
> > > to
> > > > go
> > > > > > back and redo a bunch of old code, since we'll keep the
> > > non-vectorized
> > > > > code
> > > > > > paths the way they are. (And hopefully, one day, delete them all
> > > > > outright.)
> > > > > >
> > > > > > Gian
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 

B-Slim
_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______

Reply via email to