Re: [Discuss] Integrate Arrow gandiva into Drill

Karthikeyan Manivannan Fri, 05 Apr 2019 13:24:32 -0700

Hi Weijie,

You are right. Before DRILL-6340 the purpose of the hasRemainder() logic
was not clear. projector.projectRecords() always took in the
incomingRowCount as the argument and returned the same value in
non-exceptional paths. So, I think the whole hasReaminder() was dead-code
then. I did not investigate it further because I knew that under DRILL-6340
that code would definitely be necessary.


Karthik


On Fri, Apr 5, 2019 at 9:27 AM Sorabh Hamirwasia <[email protected]>
wrote:

> Hi Weijie,
> I think the only case in which that line will be executed is if there is
> any UDF like flatten operation which results in producing multiple rows for
> each input row. Even though currently Flatten is a separate operator in
> Drill but I think that code is there to handle such cases.
>
> Thanks,
> Sorabh
>
> On Fri, Apr 5, 2019 at 6:08 AM weijie tong <[email protected]>
> wrote:
>
> > The first appearance of the comparison code is at DRILL-620 :
> >
> >
> https://github.com/apache/drill/commit/a2355d42dbff51b858fc28540915cf793f1c0fac#diff-e87beb3f2aa0fbc06b07b1d55c3d3536
> > . Before DRILL-6340 , according to the ProjectorTemplate's projectRecords
> > method and its actual input parameter values , I think  the line 234 of
> > ProjectRecordBatch will never be executed. Untill DRILL-6340 , we control
> > the output batch memory size, that part of code finally come into use.
> >
> > If I was wrong, please let me know.
> >
> > On Fri, Apr 5, 2019 at 12:15 AM weijie tong <[email protected]>
> > wrote:
> >
> > > Thanks for the reply, But it seems the code has been there even before
> > > DRILL-6340.
> > >
> > > On Thu, Apr 4, 2019 at 10:45 PM Vova Vysotskyi <[email protected]>
> wrote:
> > >
> > >> Hi Weijie,
> > >>
> > >> It is possible if maxOuputRecordCount (received from
> > >> memoryManager.getOutputRowCount()) is less than incomingRecordCount.
> > >> For more details please see DRILL-6340
> > >> <https://issues.apache.org/jira/browse/DRILL-6340> and design
> document
> > >> <
> > >>
> >
> https://docs.google.com/document/d/1h0WsQsen6xqqAyyYSrtiAniQpVZGmQNQqC1I2DJaxAA/edit?usp=sharing
> > >> >
> > >> attached to this Jira.
> > >>
> > >> Kind regards,
> > >> Volodymyr Vysotskyi
> > >>
> > >>
> > >> On Thu, Apr 4, 2019 at 5:17 PM weijie tong <[email protected]>
> > >> wrote:
> > >>
> > >> > I have a doubt about the ProjectRecordBatch implementation.  Hope
> > >> someone
> > >> > could give an explanation about that. To the line 234 of
> > >> > ProjectRecordBatch, at what case,the projector output row size less
> > than
> > >> > the input size ?
> > >> >
> > >> > On Thu, Apr 4, 2019 at 5:11 PM weijie tong <[email protected]
> >
> > >> > wrote:
> > >> >
> > >> > > Hi Igor:
> > >> > > That's a good idea! It could resolve that issue. The basic
> question
> > >> has
> > >> > > solved. To use the official Arrow,  there's still two issues
> needed
> > >> to be
> > >> > > contributed to Arrow, that I will do:
> > >> > > 1. make gcc lib static linked into the jni dynamic lib.
> > >> > >   Without this, it will require the platform installed right
> version
> > >> gcc
> > >> > > 2. add convertToNull function to gandiva
> > >> > >  This could make some project expression with convertToNull
> function
> > >> to
> > >> > be
> > >> > > gandiva executed
> > >> > >
> > >> > > Of course, without these two issues solved, I still could give an
> > >> > > integration implementation.
> > >> > >
> > >> > > BTW, once the integration is done. How do we supply the gandiva
> jni
> > >> lib ?
> > >> > > Leave it to user to build it ? or we supply different platform
> > >> > > distributions?
> > >> > >
> > >> > >
> > >> > > On Thu, Apr 4, 2019 at 3:53 PM Igor Guzenko <
> > >> [email protected]>
> > >> > > wrote:
> > >> > >
> > >> > >> Hello Weijie,
> > >> > >>
> > >> > >> Did you try to create same package as in Arrow, but in Drill and
> > use
> > >> > >> wrapper class around target for publishing
> > >> > >> desired methods with package access ?
> > >> > >>
> > >> > >> Thanks, Igor
> > >> > >>
> > >> > >> On Thu, Apr 4, 2019 at 9:51 AM weijie tong <
> > [email protected]>
> > >> > >> wrote:
> > >> > >> >
> > >> > >> > HI :
> > >> > >> >
> > >> > >> > Gandiva is a sub project of Arrow. Arrow gandiva using LLVM
> > codegen
> > >> > and
> > >> > >> > simd skill could achieve better query performance.  Arrow and
> > Drill
> > >> > has
> > >> > >> > similar column memory format. The main difference now is the
> null
> > >> > >> > representation. Also Arrow has made great changes to the
> > >> ValueVector.
> > >> > To
> > >> > >> > adopt Arrow to replace Drill's VV has been discussed before.
> That
> > >> > would
> > >> > >> be
> > >> > >> > a great job. But to leverage gandiva , by working at the
> physical
> > >> > memory
> > >> > >> > address level , this work could be little relatively.
> > >> > >> >
> > >> > >> > Now I have done the integration work at our own branch by make
> > some
> > >> > >> changes
> > >> > >> > to the Arrow branch, and issued DRILL-7087 and ARROW-4819. The
> > main
> > >> > >> changes
> > >> > >> > to ARROW-4819 is to make some package level method to be
> public.
> > >> But
> > >> > >> arrow
> > >> > >> > community seems not plan to accept this change. Their advice is
> > to
> > >> > have
> > >> > >> a
> > >> > >> > arrow branch.
> > >> > >> >
> > >> > >> > So what do you think?
> > >> > >> >
> > >> > >> > 1、Have a self branch of Arrow.
> > >> > >> > 2、waiting for the Arrow integration completely.
> > >> > >> > or some other ideas?
> > >> > >>
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [Discuss] Integrate Arrow gandiva into Drill

Reply via email to