Hi 1) Using latest spark 2.0 I've managed to run TPCDSQueryBenchmark first 9 queries and then it ends in the OutOfMemoryError [1].
What was the configuration used for running this benchmark? Can you explain the meaning of 4 shuffle partitions? Thanks! On my local system I use: ./bin/spark-submit --class org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --master local[4] jars/spark-sql_2.11-2.0.0-SNAPSHOT-tests.jar configured with: .set("spark.sql.parquet.compression.codec", "snappy") .set("spark.sql.shuffle.partitions", "4") .set("spark.driver.memory", "3g") .set("spark.executor.memory", "3g") .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 1024).toString) Scale factor of TPCDS is 5, data generated using notes from https://github.com/databricks/spark-sql-perf <https://github.com/databricks/spark-sql-perf>. 2) Running spark-sql-perf with: val experiment = tpcds.runExperiment(tpcds.runnable) on the same dataset reveals some exceptions: Running execution q9-v1.4 iteration: 1, StandardRun=true java.lang.NullPointerException at org.apache.spark.sql.execution.ScalarSubquery.dataType(subquery.scala:45) at org.apache.spark.sql.catalyst.expressions.CaseWhenBase.dataType(conditionalExpressions.scala:103) at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:165) at org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33) ... at org.apache.spark.sql.execution.ProjectExec.output(basicPhysicalOperators.scala:33) at org.apache.spark.sql.execution.WholeStageCodegenExec.output(WholeStageCodegenExec.scala:289) at org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:61) at org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:60) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) or Running execution q25-v1.4 iteration: 1, StandardRun=true java.lang.IllegalStateException: Task -1024 has already locked broadcast_755_piece0 for writing at org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:232) at org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1296) Best, Ovidiu [1] Exception in thread "broadcast-exchange-164" java.lang.OutOfMemoryError: Java heap space at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:539) at org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:803) at org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:105) at org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:816) at org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:812) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:89) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:71) at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:94) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)