Hi Weijie, You are right. Before DRILL-6340 the purpose of the hasRemainder() logic was not clear. projector.projectRecords() always took in the incomingRowCount as the argument and returned the same value in non-exceptional paths. So, I think the whole hasReaminder() was dead-code then. I did not investigate it further because I knew that under DRILL-6340 that code would definitely be necessary.
Karthik On Fri, Apr 5, 2019 at 9:27 AM Sorabh Hamirwasia <[email protected]> wrote: > Hi Weijie, > I think the only case in which that line will be executed is if there is > any UDF like flatten operation which results in producing multiple rows for > each input row. Even though currently Flatten is a separate operator in > Drill but I think that code is there to handle such cases. > > Thanks, > Sorabh > > On Fri, Apr 5, 2019 at 6:08 AM weijie tong <[email protected]> > wrote: > > > The first appearance of the comparison code is at DRILL-620 : > > > > > https://github.com/apache/drill/commit/a2355d42dbff51b858fc28540915cf793f1c0fac#diff-e87beb3f2aa0fbc06b07b1d55c3d3536 > > . Before DRILL-6340 , according to the ProjectorTemplate's projectRecords > > method and its actual input parameter values , I think the line 234 of > > ProjectRecordBatch will never be executed. Untill DRILL-6340 , we control > > the output batch memory size, that part of code finally come into use. > > > > If I was wrong, please let me know. > > > > On Fri, Apr 5, 2019 at 12:15 AM weijie tong <[email protected]> > > wrote: > > > > > Thanks for the reply, But it seems the code has been there even before > > > DRILL-6340. > > > > > > On Thu, Apr 4, 2019 at 10:45 PM Vova Vysotskyi <[email protected]> > wrote: > > > > > >> Hi Weijie, > > >> > > >> It is possible if maxOuputRecordCount (received from > > >> memoryManager.getOutputRowCount()) is less than incomingRecordCount. > > >> For more details please see DRILL-6340 > > >> <https://issues.apache.org/jira/browse/DRILL-6340> and design > document > > >> < > > >> > > > https://docs.google.com/document/d/1h0WsQsen6xqqAyyYSrtiAniQpVZGmQNQqC1I2DJaxAA/edit?usp=sharing > > >> > > > >> attached to this Jira. > > >> > > >> Kind regards, > > >> Volodymyr Vysotskyi > > >> > > >> > > >> On Thu, Apr 4, 2019 at 5:17 PM weijie tong <[email protected]> > > >> wrote: > > >> > > >> > I have a doubt about the ProjectRecordBatch implementation. Hope > > >> someone > > >> > could give an explanation about that. To the line 234 of > > >> > ProjectRecordBatch, at what case,the projector output row size less > > than > > >> > the input size ? > > >> > > > >> > On Thu, Apr 4, 2019 at 5:11 PM weijie tong <[email protected] > > > > >> > wrote: > > >> > > > >> > > Hi Igor: > > >> > > That's a good idea! It could resolve that issue. The basic > question > > >> has > > >> > > solved. To use the official Arrow, there's still two issues > needed > > >> to be > > >> > > contributed to Arrow, that I will do: > > >> > > 1. make gcc lib static linked into the jni dynamic lib. > > >> > > Without this, it will require the platform installed right > version > > >> gcc > > >> > > 2. add convertToNull function to gandiva > > >> > > This could make some project expression with convertToNull > function > > >> to > > >> > be > > >> > > gandiva executed > > >> > > > > >> > > Of course, without these two issues solved, I still could give an > > >> > > integration implementation. > > >> > > > > >> > > BTW, once the integration is done. How do we supply the gandiva > jni > > >> lib ? > > >> > > Leave it to user to build it ? or we supply different platform > > >> > > distributions? > > >> > > > > >> > > > > >> > > On Thu, Apr 4, 2019 at 3:53 PM Igor Guzenko < > > >> [email protected]> > > >> > > wrote: > > >> > > > > >> > >> Hello Weijie, > > >> > >> > > >> > >> Did you try to create same package as in Arrow, but in Drill and > > use > > >> > >> wrapper class around target for publishing > > >> > >> desired methods with package access ? > > >> > >> > > >> > >> Thanks, Igor > > >> > >> > > >> > >> On Thu, Apr 4, 2019 at 9:51 AM weijie tong < > > [email protected]> > > >> > >> wrote: > > >> > >> > > > >> > >> > HI : > > >> > >> > > > >> > >> > Gandiva is a sub project of Arrow. Arrow gandiva using LLVM > > codegen > > >> > and > > >> > >> > simd skill could achieve better query performance. Arrow and > > Drill > > >> > has > > >> > >> > similar column memory format. The main difference now is the > null > > >> > >> > representation. Also Arrow has made great changes to the > > >> ValueVector. > > >> > To > > >> > >> > adopt Arrow to replace Drill's VV has been discussed before. > That > > >> > would > > >> > >> be > > >> > >> > a great job. But to leverage gandiva , by working at the > physical > > >> > memory > > >> > >> > address level , this work could be little relatively. > > >> > >> > > > >> > >> > Now I have done the integration work at our own branch by make > > some > > >> > >> changes > > >> > >> > to the Arrow branch, and issued DRILL-7087 and ARROW-4819. The > > main > > >> > >> changes > > >> > >> > to ARROW-4819 is to make some package level method to be > public. > > >> But > > >> > >> arrow > > >> > >> > community seems not plan to accept this change. Their advice is > > to > > >> > have > > >> > >> a > > >> > >> > arrow branch. > > >> > >> > > > >> > >> > So what do you think? > > >> > >> > > > >> > >> > 1、Have a self branch of Arrow. > > >> > >> > 2、waiting for the Arrow integration completely. > > >> > >> > or some other ideas? > > >> > >> > > >> > > > > >> > > > >> > > > > > >
