Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

Ovidiu-Cristian MARCU Mon, 23 May 2016 08:58:52 -0700

Hi

1) Using latest spark 2.0 I've managed to run TPCDSQueryBenchmark first 9 
queries and then it ends in the OutOfMemoryError [1].


What was the configuration used for running this benchmark? Can you explain the 
meaning of 4 shuffle partitions? Thanks!

On my local system I use:
./bin/spark-submit --class 
org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --master local[4] 
jars/spark-sql_2.11-2.0.0-SNAPSHOT-tests.jar
configured with:
      .set("spark.sql.parquet.compression.codec", "snappy")
      .set("spark.sql.shuffle.partitions", "4")
      .set("spark.driver.memory", "3g")
      .set("spark.executor.memory", "3g")
      .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 1024).toString)

Scale factor of TPCDS is 5, data generated using notes from 
https://github.com/databricks/spark-sql-perf 
<https://github.com/databricks/spark-sql-perf>.

2) Running spark-sql-perf with: val experiment = 
tpcds.runExperiment(tpcds.runnable) on the same dataset reveals some exceptions:

Running execution q9-v1.4 iteration: 1, StandardRun=true
java.lang.NullPointerException
        at 
org.apache.spark.sql.execution.ScalarSubquery.dataType(subquery.scala:45)
        at 
org.apache.spark.sql.catalyst.expressions.CaseWhenBase.dataType(conditionalExpressions.scala:103)
        at 
org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:165)
        at 
org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
...     at 
org.apache.spark.sql.execution.ProjectExec.output(basicPhysicalOperators.scala:33)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec.output(WholeStageCodegenExec.scala:289)
        at 
org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:61)
        at 
org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:60)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)

or

Running execution q25-v1.4 iteration: 1, StandardRun=true
java.lang.IllegalStateException: Task -1024 has already locked 
broadcast_755_piece0 for writing
        at 
org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:232)
        at 
org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1296)

Best,
Ovidiu

[1]
Exception in thread "broadcast-exchange-164" java.lang.OutOfMemoryError: Java 
heap space
        at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:539)
        at 
org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:803)
        at 
org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:105)
        at 
org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:816)
        at 
org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:812)
        at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:89)
        at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:71)
        at 
org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:94)
        at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71)
        at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71)
        at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

Reply via email to