VaibhavFRI opened a new issue, #8466:
URL: https://github.com/apache/incubator-gluten/issues/8466
### Backend
VL (Velox)
### Bug description
I am testing Gluten with Velox backend for the given TPC-H benchmark scripts
provided in the repo. It is observed that few SQL queries q7, q9, q10, q12 runs
slower with gluten.
What is the reason for the slower performance for these queries and how to
improve them?
I am running the tests on ARM based AWS instance :
m7g.4xlarge , VCPUs = 16, Memory = 64GB
Data size : Used scale factor SF=100
Below is the shell script used to run the tests:
**For Gluten**
```
GLUTEN_JAR=/path/to/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar
SPARK_HOME=/home/spark/spark-3.5.2
cat tpch_parquet.scala | ${SPARK_HOME}/bin/spark-shell \
--master spark://172.32.5.244:7077 --deploy-mode client \
--conf spark.plugins=org.apache.gluten.GlutenPlugin \
--conf spark.driver.extraClassPath=${GLUTEN_JAR} \
--conf spark.executor.extraClassPath=${GLUTEN_JAR} \
--conf spark.memory.offHeap.enabled=true \
--conf spark.memory.offHeap.size=12g \
--conf spark.gluten.sql.columnar.forceShuffledHashJoin=true \
--conf spark.driver.memory=4G \
--conf spark.executor.instances=1 \
--conf spark.executor.memory=30G \
--conf spark.executor.cores=16 \
--conf spark.executor.memoryOverhead=2g \
--conf spark.driver.maxResultSize=2g \
--conf
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \
--conf spark.driver.extraJavaOptions="--illegal-access=permit
-Dio.netty.tryReflectionSetAccessible=true --add-opens
java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED" \
--conf spark.executor.extraJavaOptions="--illegal-access=permit
-Dio.netty.tryReflectionSetAccessible=true --add-opens
java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED" \
```
**For Vanilla Spark**
```
SPARK_HOME=/home/spark/spark-3.5.2
cat tpch_parquet.scala | ${SPARK_HOME}/bin/spark-shell \
--master spark://172.32.5.244:7077 --deploy-mode client \
--conf spark.memory.offHeap.enabled=true \
--conf spark.memory.offHeap.size=12g \
--conf spark.driver.memory=4G \
--conf spark.executor.instances=1 \
--conf spark.executor.memory=30G \
--conf spark.executor.cores=16 \
--conf spark.executor.memoryOverhead=2g \
--conf spark.driver.maxResultSize=2g \
```

### Spark version
Spark-3.5.x
### Spark configurations
GLUTEN_JAR=/path/to/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_aarch_64-1.3.0-SNAPSHOT.jar
SPARK_HOME=/home/spark/spark-3.5.2
cat tpch_parquet.scala | ${SPARK_HOME}/bin/spark-shell \
--master spark://172.32.5.244:7077 --deploy-mode client \
--conf spark.plugins=org.apache.gluten.GlutenPlugin \
--conf spark.driver.extraClassPath=${GLUTEN_JAR} \
--conf spark.executor.extraClassPath=${GLUTEN_JAR} \
--conf spark.memory.offHeap.enabled=true \
--conf spark.memory.offHeap.size=12g \
--conf spark.gluten.sql.columnar.forceShuffledHashJoin=true \
--conf spark.driver.memory=4G \
--conf spark.executor.instances=1 \
--conf spark.executor.memory=30G \
--conf spark.executor.cores=16 \
--conf spark.executor.memoryOverhead=2g \
--conf spark.driver.maxResultSize=2g \
--conf
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \
--conf spark.driver.extraJavaOptions="--illegal-access=permit
-Dio.netty.tryReflectionSetAccessible=true --add-opens
java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED" \
--conf spark.executor.extraJavaOptions="--illegal-access=permit
-Dio.netty.tryReflectionSetAccessible=true --add-opens
java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED" \
### System information
Gluten Version: 1.3.0-SNAPSHOT
Commit: 4dfdfd77b52f2f98fa0cf32eca143b47e4bd11b5
CMake Version: 3.28.3
System: Linux-6.8.0-1021-aws
Arch: aarch64
CPU Name:
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.4.0
CMake Prefix Path:
/usr/local;/usr;/;/usr/local/lib/python3.10/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt
### Relevant logs
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]