DamonZhao-sfu commented on issue #588: URL: https://github.com/apache/datafusion-comet/issues/588#issuecomment-2239746176
> @DamonZhao-sfu could you also provide the configs you used for the Spark run? I am seeing most queries running faster with Comet (but at 100GB) and would like to try and reproduce your results. > >  here's my config: ``` export COMET_JAR=/localhdd/hza214/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar export SPARK_LOCAL_DIRS=/mnt/smartssd_0n/hza214/sparktmp INFLUXDB_ENDPOINT=`hostname` cat tpcds_parquet.scala | /localhdd/hza214/spark-3.4/spark-3.4.2-bin-hadoop3/bin/spark-shell \ --jars $COMET_JAR \ --conf spark.comet.xxhash64.enabled=true\ --conf spark.driver.extraClassPath=$COMET_JAR \ --conf spark.executor.extraClassPath=$COMET_JAR \ --conf spark.comet.batchSize=8192 \ --conf spark.sql.autoBroadcastJoinThreshold=-1\ --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \ --conf spark.comet.enabled=true \ --conf spark.comet.exec.enabled=true \ --conf spark.comet.exec.all.enabled=true \ --conf spark.comet.parquet.io.enabled=false \ --conf spark.comet.cast.allowIncompatible=true \ --conf spark.comet.explainFallback.enabled=true\ --conf spark.memory.offHeap.enabled=true \ --conf spark.sql.adaptive.coalescePartitions.enabled=false\ --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager\ --conf spark.comet.exec.shuffle.enabled=true\ --conf spark.comet.exec.shuffle.mode=native\ --conf spark.memory.offHeap.size=50g \ --conf spark.shuffle.file.buffer=128k\ --conf spark.local.dir=/mnt/smartssd_0n/hza214/sparktmp \ --executor-cores 48 \ --driver-memory 10g \ --executor-memory 140g \ ``` But later I was advised by others that I should set a lower executor core to let more executors running in parallel in one node. I'm using 4 node clusters each with 48 core, 196GB memory and ssd as localfile disk. I have not tested with 100GB sizes yet. Let me reproduce it. Also, would you like to share your configs? @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org