Hi
1) Using latest spark 2.0 I've managed to run TPCDSQueryBenchmark first 9
queries and then it ends in the OutOfMemoryError [1].
What was the configuration used for running this benchmark? Can you explain the
meaning of 4 shuffle partitions? Thanks!
On my local system I use:
./bin/spark-submit --class
org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --master local[4]
jars/spark-sql_2.11-2.0.0-SNAPSHOT-tests.jar
configured with:
.set("spark.sql.parquet.compression.codec", "snappy")
.set("spark.sql.shuffle.partitions", "4")
.set("spark.driver.memory", "3g")
.set("spark.executor.memory", "3g")
.set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 1024).toString)
Scale factor of TPCDS is 5, data generated using notes from
https://github.com/databricks/spark-sql-perf
<https://github.com/databricks/spark-sql-perf>.
2) Running spark-sql-perf with: val experiment =
tpcds.runExperiment(tpcds.runnable) on the same dataset reveals some exceptions:
Running execution q9-v1.4 iteration: 1, StandardRun=true
java.lang.NullPointerException
at
org.apache.spark.sql.execution.ScalarSubquery.dataType(subquery.scala:45)
at
org.apache.spark.sql.catalyst.expressions.CaseWhenBase.dataType(conditionalExpressions.scala:103)
at
org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:165)
at
org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
... at
org.apache.spark.sql.execution.ProjectExec.output(basicPhysicalOperators.scala:33)
at
org.apache.spark.sql.execution.WholeStageCodegenExec.output(WholeStageCodegenExec.scala:289)
at
org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:61)
at
org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:60)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
or
Running execution q25-v1.4 iteration: 1, StandardRun=true
java.lang.IllegalStateException: Task -1024 has already locked
broadcast_755_piece0 for writing
at
org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:232)
at
org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1296)
Best,
Ovidiu
[1]
Exception in thread "broadcast-exchange-164" java.lang.OutOfMemoryError: Java
heap space
at
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:539)
at
org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:803)
at
org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:105)
at
org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:816)
at
org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:812)
at
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:89)
at
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:71)
at
org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:94)
at
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71)
at
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71)
at
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)