jinchengchenghh commented on issue #10933:
URL: 
https://github.com/apache/incubator-gluten/issues/10933#issuecomment-3490755810

   In CPU Gluten, we uses VeloxResizesBatchesExec to concat the batch after 
shuffle reader, this can help concat from different stream readers VS concat in 
ShuffleReader.
   But in GPU gluten, if we split to cudf::table and concatenateTables, most of 
the concatenateTables operation is CPU work, because we add the GPU lock to 
restrict GPU threads before generating cudf::table, so we should do the concat 
in CPU side.
   
   The lock is used for limited GPU memory. So most time the number of GPU 
threads should be less than CPU threads.
   
   For example, the EC2 instance g4dn.2xlarge, CPU memory is 30G while GPU 
memory is 15G.
   
   Future work:
   - Operator VeloxBuffersBatchExec, save all the buffers to std::vector, wrap 
it to new class BufferColumnarBatch. For example, the flat Vector contains 
{BufferPtr nulls, BufferPtr values}, not it is {std::vector<BufferPtr> nulls, 
std::vector<BufferPtr> values}
   - The first thread uses GPU decompression while the other threads still uses 
CPU decompression, because the other threads cannot execute in GPU immediately, 
they need to wait for GPU execution.
   - The other threads fetch the BufferColumnarBatch and concat to big batch 
continuously util there is not enough memory, save the Batches to 
std::vector<ColumnarBatch> and wait the downstream operator to fetch.
   - After decompression, merge the buffer especially for the validity buffer 
which requires ByteForBits, different copy, and also different for the offset 
buffer, needs to recompute the offset.
   - Lock and construct cudf::table
   
   Question:
   
   Is the first thread GPU decompression make senses? Looks like complex and 
may not benefit because the concat buffer is in single GPU thread in these 
cases.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to