jinchengchenghh commented on issue #10933:
URL:
https://github.com/apache/incubator-gluten/issues/10933#issuecomment-3490755810
In CPU Gluten, we uses VeloxResizesBatchesExec to concat the batch after
shuffle reader, this can help concat from different stream readers VS concat in
ShuffleReader.
But in GPU gluten, if we split to cudf::table and concatenateTables, most of
the concatenateTables operation is CPU work, because we add the GPU lock to
restrict GPU threads before generating cudf::table, so we should do the concat
in CPU side.
The lock is used for limited GPU memory. So most time the number of GPU
threads should be less than CPU threads.
For example, the EC2 instance g4dn.2xlarge, CPU memory is 30G while GPU
memory is 15G.
Future work:
- Operator VeloxBuffersBatchExec, save all the buffers to std::vector, wrap
it to new class BufferColumnarBatch. For example, the flat Vector contains
{BufferPtr nulls, BufferPtr values}, not it is {std::vector<BufferPtr> nulls,
std::vector<BufferPtr> values}
- The first thread uses GPU decompression while the other threads still uses
CPU decompression, because the other threads cannot execute in GPU immediately,
they need to wait for GPU execution.
- The other threads fetch the BufferColumnarBatch and concat to big batch
continuously util there is not enough memory, save the Batches to
std::vector<ColumnarBatch> and wait the downstream operator to fetch.
- After decompression, merge the buffer especially for the validity buffer
which requires ByteForBits, different copy, and also different for the offset
buffer, needs to recompute the offset.
- Lock and construct cudf::table
Question:
Is the first thread GPU decompression make senses? Looks like complex and
may not benefit because the concat buffer is in single GPU thread in these
cases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]