jinchengchenghh opened a new pull request, #11090:
URL: https://github.com/apache/incubator-gluten/pull/11090

   Now in the post rule, we can only get the stage level plan, so we cannot 
decide if the ColumnarExchange output should be cudf format, currently, we 
suppose the ColumnarExchange is always CudfColumnarExchange, will support 
validation in injectQueryStagePrepRule where we can see the total plan, 
generated the transformed plan and validate it, then discard it.
   After CudfColumnarExchange, concat the batches in 
`GpuResizeBufferColumnarBatchExec` to output cudf::table with batch size 
maximum integer, velox-cudf does not control the batch size now. So the CPU 
shuffle reader does the decompression and prepare the first batch while GPU 
executes the CPU tasks one by one. 
   Performance:
   After this change, the stage 2 in TPCDS Q95 SF100 time decreases from 27s to 
13s.
   
   In the shuffle writer, split the bool to byte, and timestamp to nanoseconds 
int64_t to match the cudf format.
   Now the makeCudfTable does not work well, so we still use the RowVector 
format, and convert to cudf::table in Velox operator CudfFromVelox.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to