[I] Performance degraded when running with centOS9 build [incubator-gluten]

via GitHub Tue, 20 Aug 2024 05:27:02 -0700


shivangi24 opened a new issue, #6943:
URL: https://github.com/apache/incubator-gluten/issues/6943


   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   We are currently working on integrating Gluten into our WatsonX.Data's Spark 
environment. However, after enabling Gluten and running the TPCH benchmark at 
the 100G scale, we are not observing the performance improvements as claimed in 
the Gluten repository. Specifically, we are seeing only a 10-12% improvement, 
whereas a 2x improvement is expected.
   
   Here are the details of our environment:
   1. Gluten was built on CentOS 9.
   2. The built jar and shared libraries are being utilized on Docker images 
based on UBI-9.
   3. Our Spark application is running with 2 executors, each configured with 6 
cores and 24GB of memory.
   4. We are processing TPCH data at a 100G scale, with data stored in Iceberg 
format.
   5. We are using Java 17.
   
   
   We have experimented with various configurations, but the performance gain 
has not exceeded 10-12% across all 22 queries. We have attached a graph showing 
the performance comparison between runs with and without Gluten.
   
   <img width="685" alt="image" 
src="https://github.com/user-attachments/assets/249fa294-3f04-45a2-a941-48b3cf07c62c";>
   
   Adding spark events for single query - Q6 
   
[f2b74f64-bdfe-42ba-a6f7-ad81028cb2d7_events.zip](https://github.com/user-attachments/files/16675593/f2b74f64-bdfe-42ba-a6f7-ad81028cb2d7_events.zip)
   cc: @deepashreeraghu @majetideepak
   
   ### Spark version
   
   Spark-3.4.x
   
   ### Spark configurations
   
   ```
   ### spark configs for driver and executor 
   "spark.executor.cores": "6",
   "spark.executor.memory": "24G",
   "spark.driver.cores": "6",
   "spark.driver.memory": "24G",
   "spark.driver.extraClassPath": 
"/opt/ibm/spark/external-jars/gluten-velox-bundle-spark3.4_2.12-centos_9_x86_64-1.2.0-SNAPSHOT.jar",
   "spark.executor.extraClassPath": 
"/opt/ibm/spark/external-jars/gluten-velox-bundle-spark3.4_2.12-centos_9_x86_64-1.2.0-SNAPSHOT.jar",
   "spark.hive.metastore.uris": "thrift://<hive-metastore-URL>",
   "spark.sql.defaultCatalog": "lakehouse",
   "spark.sql.extensions": 
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
   "spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog",
   "spark.sql.catalog.lakehouse.type": "hive",
   "spark.sql.iceberg.vectorization.enabled": "false",
   "spark.hive.metastore.client.auth.mode": "PLAIN",
   "spark.hive.metastore.client.plain.username": "<metastore username>",
   "spark.hive.metastore.client.plain.password": "<metastore password>",
   "spark.hive.metastore.use.SSL": "true",
   "spark.hive.metastore.truststore.type": "JKS",
   "spark.hive.metastore.truststore.path": 
"file:///opt/ibm/jdk/lib/security/cacerts",
   "spark.hive.metastore.truststore.password": "changeit",
   
   ### Main Gluten configs
   "spark.gluten.enabled": "true",
   "spark.plugins": "org.apache.gluten.GlutenPlugin",
   "spark.shuffle.manager": 
"org.apache.spark.shuffle.sort.ColumnarShuffleManager",
   "spark.gluten.loadLibFromJar": "true",
   "spark.gluten.sql.columnar.forceShuffledHashJoin": "true",
   "spark.gluten.sql.columnar.backend.lib": "velox",
    
   ### Java-related updates
   "spark.driver.extraJavaOptions": "-Dio.netty.tryReflectionSetAccessible=true 
-XX:MaxDirectMemorySize=1G -Djdk.nio.maxCachedBufferSize=262144",
   "spark.executor.extraJavaOptions": 
"-Dio.netty.tryReflectionSetAccessible=true -XX:MaxDirectMemorySize=1G 
-Djdk.nio.maxCachedBufferSize=262144",
    
   ### Fallback-related
   "spark.gluten.sql.columnar.joinOptimizationLevel": "18",
   "spark.gluten.sql.columnar.physicalJoinOptimizeEnable": "true",
   "spark.gluten.sql.columnar.physicalJoinOptimizationLevel": "18",
   "spark.gluten.sql.columnar.logicalJoinOptimizeEnable": "true",
   "spark.gluten.sql.columnar.logicalJoinOptimizationLevel": "19",
   "spark.gluten.sql.columnar.fallback.expressions.threshold": "2",
    
   ### Memory-related
   "spark.memory.offHeap.enabled": "true",
   "spark.executor.memoryOverheadFactor": "75",
   "spark.memory.offHeap.size": "18g",
    
   #### AQE-related
   "spark.sql.adaptive.enabled": "true",
   "spark.gluten.sql.columnar.shuffle.writeEOS": "false",
   "spark.gluten.sql.columnar.backend.ch.shuffle.hash.algorithm": 
"sparkMurmurHash3_32",
    
   ### Shuffle and Compression-related
   "spark.shuffle.compress": "true",
   "spark.gluten.sql.columnar.shuffle.compressionMode": "buffer",
   "spark.sql.optimizer.runtime.bloomFilter.enabled": "true",
   "spark.sql.optimizer.runtime.bloomFilter.applicationSideScanSizeThreshold": 
"1KB",
   "spark.gluten.sql.columnar.force.hashagg": "false",
   ```
   
   Ran with 2 executors of (6*24G)
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Performance degraded when running with centOS9 build [incubator-gluten]

Reply via email to