Re: [Discuss] Integrate Arrow gandiva into Drill

Vova Vysotskyi Thu, 04 Apr 2019 07:45:08 -0700

Hi Weijie,

It is possible if maxOuputRecordCount (received from
memoryManager.getOutputRowCount()) is less than incomingRecordCount.
For more details please see DRILL-6340
<https://issues.apache.org/jira/browse/DRILL-6340> and design document
<https://docs.google.com/document/d/1h0WsQsen6xqqAyyYSrtiAniQpVZGmQNQqC1I2DJaxAA/edit?usp=sharing>
attached to this Jira.


Kind regards,
Volodymyr Vysotskyi


On Thu, Apr 4, 2019 at 5:17 PM weijie tong <[email protected]> wrote:

> I have a doubt about the ProjectRecordBatch implementation.  Hope someone
> could give an explanation about that. To the line 234 of
> ProjectRecordBatch, at what case,the projector output row size less than
> the input size ?
>
> On Thu, Apr 4, 2019 at 5:11 PM weijie tong <[email protected]>
> wrote:
>
> > Hi Igor:
> > That's a good idea! It could resolve that issue. The basic question has
> > solved. To use the official Arrow,  there's still two issues needed to be
> > contributed to Arrow, that I will do:
> > 1. make gcc lib static linked into the jni dynamic lib.
> >   Without this, it will require the platform installed right version gcc
> > 2. add convertToNull function to gandiva
> >  This could make some project expression with convertToNull function to
> be
> > gandiva executed
> >
> > Of course, without these two issues solved, I still could give an
> > integration implementation.
> >
> > BTW, once the integration is done. How do we supply the gandiva jni lib ?
> > Leave it to user to build it ? or we supply different platform
> > distributions?
> >
> >
> > On Thu, Apr 4, 2019 at 3:53 PM Igor Guzenko <[email protected]>
> > wrote:
> >
> >> Hello Weijie,
> >>
> >> Did you try to create same package as in Arrow, but in Drill and use
> >> wrapper class around target for publishing
> >> desired methods with package access ?
> >>
> >> Thanks, Igor
> >>
> >> On Thu, Apr 4, 2019 at 9:51 AM weijie tong <[email protected]>
> >> wrote:
> >> >
> >> > HI :
> >> >
> >> > Gandiva is a sub project of Arrow. Arrow gandiva using LLVM codegen
> and
> >> > simd skill could achieve better query performance.  Arrow and Drill
> has
> >> > similar column memory format. The main difference now is the null
> >> > representation. Also Arrow has made great changes to the ValueVector.
> To
> >> > adopt Arrow to replace Drill's VV has been discussed before. That
> would
> >> be
> >> > a great job. But to leverage gandiva , by working at the physical
> memory
> >> > address level , this work could be little relatively.
> >> >
> >> > Now I have done the integration work at our own branch by make some
> >> changes
> >> > to the Arrow branch, and issued DRILL-7087 and ARROW-4819. The main
> >> changes
> >> > to ARROW-4819 is to make some package level method to be public. But
> >> arrow
> >> > community seems not plan to accept this change. Their advice is to
> have
> >> a
> >> > arrow branch.
> >> >
> >> > So what do you think?
> >> >
> >> > 1、Have a self branch of Arrow.
> >> > 2、waiting for the Arrow integration completely.
> >> > or some other ideas?
> >>
> >
>

Re: [Discuss] Integrate Arrow gandiva into Drill

Reply via email to