jinchengchenghh opened a new pull request, #11090: URL: https://github.com/apache/incubator-gluten/pull/11090
Now in the post rule, we can only get the stage level plan, so we cannot decide if the ColumnarExchange output should be cudf format, currently, we suppose the ColumnarExchange is always CudfColumnarExchange, will support validation in injectQueryStagePrepRule where we can see the total plan, generated the transformed plan and validate it, then discard it. After CudfColumnarExchange, concat the batches in `GpuResizeBufferColumnarBatchExec` to output cudf::table with batch size maximum integer, velox-cudf does not control the batch size now. So the CPU shuffle reader does the decompression and prepare the first batch while GPU executes the CPU tasks one by one. Performance: After this change, the stage 2 in TPCDS Q95 SF100 time decreases from 27s to 13s. In the shuffle writer, split the bool to byte, and timestamp to nanoseconds int64_t to match the cudf format. Now the makeCudfTable does not work well, so we still use the RowVector format, and convert to cudf::table in Velox operator CudfFromVelox. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
