jinchengchenghh opened a new pull request, #11758:
URL: https://github.com/apache/gluten/pull/11758

   Cache the batch in cpu cache, and wait for the join threads to fetch one by 
one, the buffer size is controlled by 
`spark.gluten.sql.columnar.backend.velox.cudf.shuffleMaxPrefetchBytes` 
temporally, the size may be changed by the remaining memory in the server.
   
   Test:
   Test in local SF100, adjust the config to enable caching batch.
   ```
   --conf spark.gluten.sql.columnar.backend.velox.cudf.batchSize=10000 \
   --conf 
spark.gluten.sql.columnar.backend.velox.cudf.shuffleMaxPrefetchBytes=1024MB
   ```
   The log prints `Prefetched 171 batches (24057900 bytes) before blocking on 
GPU lock`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to