jinchengchenghh opened a new issue, #11524: URL: https://github.com/apache/incubator-gluten/issues/11524
### Backend VL (Velox) ### Bug description We suppose to schedule some IO bound tasks such as the stage contains table scan to CPU node, and some computation intensive tasks to GPU. Now Spark has this ability to do stage resource scheduler by resource profile as this document https://spark.apache.org/docs/latest/configuration.html#custom-resource-scheduling-and-configuration-overview describes, in Gluten, there has been offheap/onheap memory allocation adjusted by ResourceProfile This script describes how to set up GPU host environment, the script has executed on the IBM internal AMI linux image, so if you use IBM pipeline `pipeline-create-dev-vm` and select GPU node such as g4dn.xlarge, the environment is ready, no need to execute the script. https://raw.githubusercontent.com/jinchengchenghh/gluten/cudf_script/dev/start_cudf_amazon.sh Note: The environment has been upgraded to cuda 13.1 because cudf build issue, but the script install cuda 12.8, it is outdated. This document describes how to set up yarn on GPU node. https://docs.nvidia.com/spark-rapids/user-guide/23.10/getting-started/yarn-gpu.html https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/UsingGpus.html https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-rapids.html GPU document describes how to build with GPU https://github.com/apache/incubator-gluten/blob/main/docs/get-started/VeloxGPU.mdutdated. Existing offheap/onheap memory ResourceProfile allocation, we should use the similar way to set the profile to require 1 GPU, now the Spark cannot set the core number by resource profile, this feature is under developing. https://github.com/apache/incubator-gluten/pull/8209 We could use TPCDS q95 to test. The query runs successfully on yarn, but if we set up the environment according to https://docs.nvidia.com/spark-rapids/user-guide/23.10/getting-started/yarn-gpu.html, the query will hang, I also tried stand alone mode before, it also hangs. ### Gluten version _No response_ ### Spark version None ### Spark configurations _No response_ ### System information _No response_ ### Relevant logs ```bash ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
