Re: [PR] [WIP] Support adjusting stage execution mode [incubator-gluten]

via GitHub Mon, 02 Feb 2026 02:03:08 -0800


marin-ma commented on PR #11543:
URL: 
https://github.com/apache/incubator-gluten/pull/11543#issuecomment-3834131694


   Steps to test this feature in a local standalone spark cluster with mocked 
GPU resources. No actual GPU resources are required.
   
   1. Create resource scripts for CPU workers and GPU workers
   
   - CPU workers
   
     **cpu.conf**
     ```
     spark.worker.resource.cpu.amount   1
     spark.worker.resource.cpu.discoveryScript /path/to/cpu.sh
     ```
     **cpu.sh**
     ```
     #!/usr/bin/env bash
     
     echo {\"name\": \"cpu\", \"addresses\":[\"0\"]}
     ```
   
   - GPU workers
   
     **gpu.conf**
     ```
     spark.worker.resource.gpu.amount   1
     spark.worker.resource.gpu.discoveryScript /path/to/gpu.sh
     ```
   
     **gpu.sh**
     ```
     #!/usr/bin/env bash
     
     echo {\"name\": \"gpu\", \"addresses\":[\"1\"]}
     ```
   
   2. Start spark master, 2 CPU workers and 1 GPU workers. Each CPU workers has 
one "cpu" resource, and each GPU workers has one "gpu" resource.
   
   ```
   sbin/start-master.sh
   
   export SPARK_WORKER_DIR=/tmp/spark-worker-1
   export SPARK_PID_DIR=/tmp/spark-pid-1
   ./sbin/start-worker.sh spark://localhost:7077 \
     --webui-port 8081 --properties-file /path/to/gpu.conf
    
    
   export SPARK_WORKER_DIR=/tmp/spark-worker-2
   export SPARK_PID_DIR=/tmp/spark-pid-2
   ./sbin/start-worker.sh spark://localhost:7077 \
     --webui-port 8082 --properties-file /path/to/cpu.conf
    
   export SPARK_WORKER_DIR=/tmp/spark-worker-3
   export SPARK_PID_DIR=/tmp/spark-pid-3
   ./sbin/start-worker.sh spark://localhost:7077 \
     --webui-port 8083 --properties-file /path/to/cpu.conf
   ```
   
   3. Run spark application with below configurations added. This will start 1 
executor on each worker node. Default execution will be scheduled onto CPU 
worker nodes, and the selected gpu stages (configured by 
`spark.gluten.auto.adjustStageExecutionMode=true`, currently only the join 
stages) will be scheduled onto GPU worker nodes.
   
   If set `spark.gluten.auto.adjustStageExecutionMode=false`, all stages that 
can use cudf will be scheduled onto GPU worker nodes.
   ```
   spark.driver.extraJavaOptions "-Dspark.testing=true 
-Dio.netty.tryReflectionSetAccessible=true"
   spark.dynamicAllocation.enabled true
   spark.executor.resource.cpu.amount 1
   spark.executor.resource.cpu.discoveryScript=/path/to/cpu.sh
   spark.gluten.sql.columnar.backend.velox.cudf.enableValidation=false
   spark.gluten.sql.columnar.cudf=true
   spark.gluten.auto.adjustStageExecutionMode=true
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [WIP] Support adjusting stage execution mode [incubator-gluten]

Reply via email to