shashank created SPARK-36071:
--------------------------------

             Summary: Spark driver requires large memory space for serialized 
results even there are no data collected to the driver
                 Key: SPARK-36071
                 URL: https://issues.apache.org/jira/browse/SPARK-36071
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.3
            Reporter: shashank


Executing with large partition is causing the data transferred to driver exceed 
spark.driver.maxResultSize.

Even when no data from the logic is being collected at by the driver. Looks 
like spark is sending metadata back which is causing it to exceed.
{code:java}
spark.driver.maxResultSize=8g{code}
 
{code:java}
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Total size of serialized results of 104904 tasks (8.0 GB) is bigger than 
spark.driver.maxResultSize (8.0 GB)Caused by: org.apache.spark.SparkException: 
Job aborted due to stage failure: Total size of serialized results of 104904 
tasks (8.0 GB) is bigger than spark.driver.maxResultSize (8.0 GB) at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2041)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2029)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2028)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2028) at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:966)
 at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:966)
 at scala.Option.foreach(Option.scala:257) at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:966)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2262)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2211)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2200)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:777) at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2114) at 
org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78)
 ... 54 more{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to