shashank created SPARK-36071:
--------------------------------
Summary: Spark driver requires large memory space for serialized
results even there are no data collected to the driver
Key: SPARK-36071
URL: https://issues.apache.org/jira/browse/SPARK-36071
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.4.3
Reporter: shashank
Executing with large partition is causing the data transferred to driver exceed
spark.driver.maxResultSize.
Even when no data from the logic is being collected at by the driver. Looks
like spark is sending metadata back which is causing it to exceed.
{code:java}
spark.driver.maxResultSize=8g{code}
{code:java}
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure:
Total size of serialized results of 104904 tasks (8.0 GB) is bigger than
spark.driver.maxResultSize (8.0 GB)Caused by: org.apache.spark.SparkException:
Job aborted due to stage failure: Total size of serialized results of 104904
tasks (8.0 GB) is bigger than spark.driver.maxResultSize (8.0 GB) at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2041)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2029)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2028)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2028) at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:966)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:966)
at scala.Option.foreach(Option.scala:257) at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:966)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2262)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2211)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2200)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:777) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2114) at
org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78)
... 54 more{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]