Yes, git log commit dafcb05c2ef8e09f45edfb7eabf58116c23975a0 Author: Sameer Agarwal <sam...@databricks.com> Date: Sun May 22 23:32:39 2016 -0700
for #2 see my comments in https://issues.apache.org/jira/browse/SPARK-15078 <https://issues.apache.org/jira/browse/SPARK-15078> > On 23 May 2016, at 18:16, Ted Yu <yuzhih...@gmail.com> wrote: > > Can you tell us the commit hash using which the test was run ? > > For #2, if you can give full stack trace, that would be nice. > > Thanks > > On Mon, May 23, 2016 at 8:58 AM, Ovidiu-Cristian MARCU > <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>> > wrote: > Hi > > 1) Using latest spark 2.0 I've managed to run TPCDSQueryBenchmark first 9 > queries and then it ends in the OutOfMemoryError [1]. > > What was the configuration used for running this benchmark? Can you explain > the meaning of 4 shuffle partitions? Thanks! > > On my local system I use: > ./bin/spark-submit --class > org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --master > local[4] jars/spark-sql_2.11-2.0.0-SNAPSHOT-tests.jar > configured with: > .set("spark.sql.parquet.compression.codec", "snappy") > .set("spark.sql.shuffle.partitions", "4") > .set("spark.driver.memory", "3g") > .set("spark.executor.memory", "3g") > .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * > 1024).toString) > > Scale factor of TPCDS is 5, data generated using notes from > https://github.com/databricks/spark-sql-perf > <https://github.com/databricks/spark-sql-perf>. > > 2) Running spark-sql-perf with: val experiment = > tpcds.runExperiment(tpcds.runnable) on the same dataset reveals some > exceptions: > > Running execution q9-v1.4 iteration: 1, StandardRun=true > java.lang.NullPointerException > at > org.apache.spark.sql.execution.ScalarSubquery.dataType(subquery.scala:45) > at > org.apache.spark.sql.catalyst.expressions.CaseWhenBase.dataType(conditionalExpressions.scala:103) > at > org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:165) > at > org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33) > ... at > org.apache.spark.sql.execution.ProjectExec.output(basicPhysicalOperators.scala:33) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.output(WholeStageCodegenExec.scala:289) > at > org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:61) > at > org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:60) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774) > > or > > Running execution q25-v1.4 iteration: 1, StandardRun=true > java.lang.IllegalStateException: Task -1024 has already locked > broadcast_755_piece0 for writing > at > org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:232) > at > org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1296) > > Best, > Ovidiu > > [1] > Exception in thread "broadcast-exchange-164" java.lang.OutOfMemoryError: Java > heap space > at > org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:539) > at > org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:803) > at > org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:105) > at > org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:816) > at > org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:812) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:89) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:71) > at > org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:94) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >