marin-ma commented on PR #11543:
URL:
https://github.com/apache/incubator-gluten/pull/11543#issuecomment-3834131694
Steps to test this feature in a local standalone spark cluster with mocked
GPU resources. No actual GPU resources are required.
1. Create resource scripts for CPU workers and GPU workers
- CPU workers
**cpu.conf**
```
spark.worker.resource.cpu.amount 1
spark.worker.resource.cpu.discoveryScript /path/to/cpu.sh
```
**cpu.sh**
```
#!/usr/bin/env bash
echo {\"name\": \"cpu\", \"addresses\":[\"0\"]}
```
- GPU workers
**gpu.conf**
```
spark.worker.resource.gpu.amount 1
spark.worker.resource.gpu.discoveryScript /path/to/gpu.sh
```
**gpu.sh**
```
#!/usr/bin/env bash
echo {\"name\": \"gpu\", \"addresses\":[\"1\"]}
```
2. Start spark master, 2 CPU workers and 1 GPU workers. Each CPU workers has
one "cpu" resource, and each GPU workers has one "gpu" resource.
```
sbin/start-master.sh
export SPARK_WORKER_DIR=/tmp/spark-worker-1
export SPARK_PID_DIR=/tmp/spark-pid-1
./sbin/start-worker.sh spark://localhost:7077 \
--webui-port 8081 --properties-file /path/to/gpu.conf
export SPARK_WORKER_DIR=/tmp/spark-worker-2
export SPARK_PID_DIR=/tmp/spark-pid-2
./sbin/start-worker.sh spark://localhost:7077 \
--webui-port 8082 --properties-file /path/to/cpu.conf
export SPARK_WORKER_DIR=/tmp/spark-worker-3
export SPARK_PID_DIR=/tmp/spark-pid-3
./sbin/start-worker.sh spark://localhost:7077 \
--webui-port 8083 --properties-file /path/to/cpu.conf
```
3. Run spark application with below configurations added. This will start 1
executor on each worker node. Default execution will be scheduled onto CPU
worker nodes, and the selected gpu stages (configured by
`spark.gluten.auto.adjustStageExecutionMode=true`, currently only the join
stages) will be scheduled onto GPU worker nodes.
If set `spark.gluten.auto.adjustStageExecutionMode=false`, all stages that
can use cudf will be scheduled onto GPU worker nodes.
```
spark.driver.extraJavaOptions "-Dspark.testing=true
-Dio.netty.tryReflectionSetAccessible=true"
spark.dynamicAllocation.enabled true
spark.executor.resource.cpu.amount 1
spark.executor.resource.cpu.discoveryScript=/path/to/cpu.sh
spark.gluten.sql.columnar.backend.velox.cudf.enableValidation=false
spark.gluten.sql.columnar.cudf=true
spark.gluten.auto.adjustStageExecutionMode=true
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]