Re: Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

Ovidiu-Cristian MARCU Mon, 23 May 2016 10:16:40 -0700

Yes,

git log
commit dafcb05c2ef8e09f45edfb7eabf58116c23975a0
Author: Sameer Agarwal <sam...@databricks.com>
Date:   Sun May 22 23:32:39 2016 -0700


for #2 see my comments in https://issues.apache.org/jira/browse/SPARK-15078 
<https://issues.apache.org/jira/browse/SPARK-15078>

> On 23 May 2016, at 18:16, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> Can you tell us the commit hash using which the test was run ?
> 
> For #2, if you can give full stack trace, that would be nice.
> 
> Thanks
> 
> On Mon, May 23, 2016 at 8:58 AM, Ovidiu-Cristian MARCU 
> <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>> 
> wrote:
> Hi
> 
> 1) Using latest spark 2.0 I've managed to run TPCDSQueryBenchmark first 9 
> queries and then it ends in the OutOfMemoryError [1].
> 
> What was the configuration used for running this benchmark? Can you explain 
> the meaning of 4 shuffle partitions? Thanks!
> 
> On my local system I use:
> ./bin/spark-submit --class 
> org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --master 
> local[4] jars/spark-sql_2.11-2.0.0-SNAPSHOT-tests.jar
> configured with:
>       .set("spark.sql.parquet.compression.codec", "snappy")
>       .set("spark.sql.shuffle.partitions", "4")
>       .set("spark.driver.memory", "3g")
>       .set("spark.executor.memory", "3g")
>       .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 
> 1024).toString)
> 
> Scale factor of TPCDS is 5, data generated using notes from 
> https://github.com/databricks/spark-sql-perf 
> <https://github.com/databricks/spark-sql-perf>.
> 
> 2) Running spark-sql-perf with: val experiment = 
> tpcds.runExperiment(tpcds.runnable) on the same dataset reveals some 
> exceptions:
> 
> Running execution q9-v1.4 iteration: 1, StandardRun=true
> java.lang.NullPointerException
>       at 
> org.apache.spark.sql.execution.ScalarSubquery.dataType(subquery.scala:45)
>       at 
> org.apache.spark.sql.catalyst.expressions.CaseWhenBase.dataType(conditionalExpressions.scala:103)
>       at 
> org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:165)
>       at 
> org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
> ...   at 
> org.apache.spark.sql.execution.ProjectExec.output(basicPhysicalOperators.scala:33)
>       at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.output(WholeStageCodegenExec.scala:289)
>       at 
> org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:61)
>       at 
> org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:60)
>       at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>       at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
> 
> or
> 
> Running execution q25-v1.4 iteration: 1, StandardRun=true
> java.lang.IllegalStateException: Task -1024 has already locked 
> broadcast_755_piece0 for writing
>       at 
> org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:232)
>       at 
> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1296)
> 
> Best,
> Ovidiu
> 
> [1]
> Exception in thread "broadcast-exchange-164" java.lang.OutOfMemoryError: Java 
> heap space
>       at 
> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:539)
>       at 
> org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:803)
>       at 
> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:105)
>       at 
> org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:816)
>       at 
> org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:812)
>       at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:89)
>       at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:71)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:94)
>       at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71)
>       at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71)
>       at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>       at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
>

Re: Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

Reply via email to