shivangi24 opened a new issue, #6943: URL: https://github.com/apache/incubator-gluten/issues/6943
### Backend VL (Velox) ### Bug description We are currently working on integrating Gluten into our WatsonX.Data's Spark environment. However, after enabling Gluten and running the TPCH benchmark at the 100G scale, we are not observing the performance improvements as claimed in the Gluten repository. Specifically, we are seeing only a 10-12% improvement, whereas a 2x improvement is expected. Here are the details of our environment: 1. Gluten was built on CentOS 9. 2. The built jar and shared libraries are being utilized on Docker images based on UBI-9. 3. Our Spark application is running with 2 executors, each configured with 6 cores and 24GB of memory. 4. We are processing TPCH data at a 100G scale, with data stored in Iceberg format. 5. We are using Java 17. We have experimented with various configurations, but the performance gain has not exceeded 10-12% across all 22 queries. We have attached a graph showing the performance comparison between runs with and without Gluten. <img width="685" alt="image" src="https://github.com/user-attachments/assets/249fa294-3f04-45a2-a941-48b3cf07c62c"> Adding spark events for single query - Q6 [f2b74f64-bdfe-42ba-a6f7-ad81028cb2d7_events.zip](https://github.com/user-attachments/files/16675593/f2b74f64-bdfe-42ba-a6f7-ad81028cb2d7_events.zip) cc: @deepashreeraghu @majetideepak ### Spark version Spark-3.4.x ### Spark configurations ``` ### spark configs for driver and executor "spark.executor.cores": "6", "spark.executor.memory": "24G", "spark.driver.cores": "6", "spark.driver.memory": "24G", "spark.driver.extraClassPath": "/opt/ibm/spark/external-jars/gluten-velox-bundle-spark3.4_2.12-centos_9_x86_64-1.2.0-SNAPSHOT.jar", "spark.executor.extraClassPath": "/opt/ibm/spark/external-jars/gluten-velox-bundle-spark3.4_2.12-centos_9_x86_64-1.2.0-SNAPSHOT.jar", "spark.hive.metastore.uris": "thrift://<hive-metastore-URL>", "spark.sql.defaultCatalog": "lakehouse", "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog", "spark.sql.catalog.lakehouse.type": "hive", "spark.sql.iceberg.vectorization.enabled": "false", "spark.hive.metastore.client.auth.mode": "PLAIN", "spark.hive.metastore.client.plain.username": "<metastore username>", "spark.hive.metastore.client.plain.password": "<metastore password>", "spark.hive.metastore.use.SSL": "true", "spark.hive.metastore.truststore.type": "JKS", "spark.hive.metastore.truststore.path": "file:///opt/ibm/jdk/lib/security/cacerts", "spark.hive.metastore.truststore.password": "changeit", ### Main Gluten configs "spark.gluten.enabled": "true", "spark.plugins": "org.apache.gluten.GlutenPlugin", "spark.shuffle.manager": "org.apache.spark.shuffle.sort.ColumnarShuffleManager", "spark.gluten.loadLibFromJar": "true", "spark.gluten.sql.columnar.forceShuffledHashJoin": "true", "spark.gluten.sql.columnar.backend.lib": "velox", ### Java-related updates "spark.driver.extraJavaOptions": "-Dio.netty.tryReflectionSetAccessible=true -XX:MaxDirectMemorySize=1G -Djdk.nio.maxCachedBufferSize=262144", "spark.executor.extraJavaOptions": "-Dio.netty.tryReflectionSetAccessible=true -XX:MaxDirectMemorySize=1G -Djdk.nio.maxCachedBufferSize=262144", ### Fallback-related "spark.gluten.sql.columnar.joinOptimizationLevel": "18", "spark.gluten.sql.columnar.physicalJoinOptimizeEnable": "true", "spark.gluten.sql.columnar.physicalJoinOptimizationLevel": "18", "spark.gluten.sql.columnar.logicalJoinOptimizeEnable": "true", "spark.gluten.sql.columnar.logicalJoinOptimizationLevel": "19", "spark.gluten.sql.columnar.fallback.expressions.threshold": "2", ### Memory-related "spark.memory.offHeap.enabled": "true", "spark.executor.memoryOverheadFactor": "75", "spark.memory.offHeap.size": "18g", #### AQE-related "spark.sql.adaptive.enabled": "true", "spark.gluten.sql.columnar.shuffle.writeEOS": "false", "spark.gluten.sql.columnar.backend.ch.shuffle.hash.algorithm": "sparkMurmurHash3_32", ### Shuffle and Compression-related "spark.shuffle.compress": "true", "spark.gluten.sql.columnar.shuffle.compressionMode": "buffer", "spark.sql.optimizer.runtime.bloomFilter.enabled": "true", "spark.sql.optimizer.runtime.bloomFilter.applicationSideScanSizeThreshold": "1KB", "spark.gluten.sql.columnar.force.hashagg": "false", ``` Ran with 2 executors of (6*24G) ### System information _No response_ ### Relevant logs _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
