[ 
https://issues.apache.org/jira/browse/SPARK-31496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091189#comment-17091189
 ] 

Hyukjin Kwon commented on SPARK-31496:
--------------------------------------

Is this a regression? Sounds more like a question which should be best asked to 
mailing list. You could have a better answer there.

> Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError
> ------------------------------------------------------------------------
>
>                 Key: SPARK-31496
>                 URL: https://issues.apache.org/jira/browse/SPARK-31496
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>         Environment: Windows 10 (1909)
> JDK 11.0.6
> spark-3.0.0-preview2-bin-hadoop3.2
> local[1]
>  
>  
>            Reporter: Tomas Shestakov
>            Priority: Major
>              Labels: out-of-memory
>
> Local spark with one core (local[1]) while trying to save Dataset to parquet 
> local file cause OOM. 
> {code:java}
> SparkSession sparkSession = SparkSession.builder()
>         .appName("Loader impl test")
>         .master("local[1]")
>         .config("spark.ui.enabled", false)
>         .config("spark.sql.datetime.java8API.enabled", true)
>         .config("spark.serializer", 
> "org.apache.spark.serializer.KryoSerializer")
>         .config("spark.kryoserializer.buffer.max", "1g")
>         .config("spark.executor.memory", "4g")
>         .config("spark.driver.memory", "8g")
>         .getOrCreate();
> {code}
> {noformat}
> [20-Apr-2020 11:42:27.877]  INFO [boundedElastic-2 
> o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output 
> committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter
> [20-Apr-2020 11:42:27.877]  INFO [boundedElastic-2 
> o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output 
> committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter
> [20-Apr-2020 11:42:27.967]  INFO [boundedElastic-2 
> o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - File Output 
> Committer Algorithm version is 1
> [20-Apr-2020 11:42:27.969]  INFO [boundedElastic-2 
> o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using user defined 
> output committer class org.apache.parquet.hadoop.ParquetOutputCommitter
> [20-Apr-2020 11:42:27.970]  INFO [boundedElastic-2 
> o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - File Output 
> Committer Algorithm version is 1
> [20-Apr-2020 11:42:27.973]  INFO [boundedElastic-2 
> o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using output committer 
> class org.apache.parquet.hadoop.ParquetOutputCommitter
> [20-Apr-2020 11:42:34.371]  INFO [boundedElastic-2 
> org.apache.spark.SparkContext:57] q: - Starting job: save at 
> LoaderImpl.java:305
> [20-Apr-2020 11:42:34.389]  INFO [dag-scheduler-event-loop 
> org.apache.spark.scheduler.DAGScheduler:57] q: - Got job 0 (save at 
> LoaderImpl.java:305) with 1 output partitions
> [20-Apr-2020 11:42:34.390]  INFO [dag-scheduler-event-loop 
> org.apache.spark.scheduler.DAGScheduler:57] q: - Final stage: ResultStage 0 
> (save at LoaderImpl.java:305)
> [20-Apr-2020 11:42:34.390]  INFO [dag-scheduler-event-loop 
> org.apache.spark.scheduler.DAGScheduler:57] q: - Parents of final stage: 
> List()
> [20-Apr-2020 11:42:34.392]  INFO [dag-scheduler-event-loop 
> org.apache.spark.scheduler.DAGScheduler:57] q: - Missing parents: 
> List()[20-Apr-2020 11:42:34.398]  INFO [dag-scheduler-event-loop 
> org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting ResultStage 0 
> (MapPartitionsRDD[6] at save at LoaderImpl.java:305), which has no missing 
> parents
> [20-Apr-2020 11:42:34.634]  INFO [dag-scheduler-event-loop 
> org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0 stored 
> as values in memory (estimated size 166.1 KiB, free 18.4 GiB)
> [20-Apr-2020 11:42:34.945]  INFO [dag-scheduler-event-loop 
> org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0_piece0 
> stored as bytes in memory (estimated size 58.0 KiB, free 18.4 GiB)
> [20-Apr-2020 11:42:34.949]  INFO [dispatcher-BlockManagerMaster 
> org.apache.spark.storage.BlockManagerInfo:57] q: - Added broadcast_0_piece0 
> in memory on DESKTOP-A1:58276 (size: 58.0 KiB, free: 18.4 GiB)
> [20-Apr-2020 11:42:34.953]  INFO [dag-scheduler-event-loop 
> org.apache.spark.SparkContext:57] q: - Created broadcast 0 from broadcast at 
> DAGScheduler.scala:1206
> [20-Apr-2020 11:42:34.980]  INFO [dag-scheduler-event-loop 
> org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting 1 missing tasks 
> from ResultStage 0 (MapPartitionsRDD[6] at save at LoaderImpl.java:305) 
> (first 15 tasks are for partitions Vector(0))
> [20-Apr-2020 11:42:34.981]  INFO [dag-scheduler-event-loop 
> org.apache.spark.scheduler.TaskSchedulerImpl:57] q: - Adding task set 0.0 
> with 1 tasks
> Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError at 
> java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125)
>  at 
> java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119) 
> at 
> java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
>  at 
> java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156) 
> at 
> org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
>  at 
> java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1859)
>  at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:712) 
> at org.apache.spark.util.Utils$$anon$2.write(Utils.scala:153) at 
> com.esotericsoftware.kryo.io.Output.flush(Output.java:185) at 
> com.esotericsoftware.kryo.io.Output.close(Output.java:196) at 
> org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:273)
>  at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:158) at 
> org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$writeObject$1(ParallelCollectionRDD.scala:65)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1343) at 
> org.apache.spark.rdd.ParallelCollectionPartition.writeObject(ParallelCollectionRDD.scala:51)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> java.base/java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1130)
>  at 
> java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1497)
>  at 
> java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
>  at 
> java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179)
>  at 
> java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553)
>  at 
> java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510)
>  at 
> java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
>  at 
> java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179)
>  at 
> java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) 
> at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
>  at 
> org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:428)
>  at scala.Option.map(Option.scala:163) at 
> org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:409)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:346)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at 
> org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:340)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$18(TaskSchedulerImpl.scala:464)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$18$adapted(TaskSchedulerImpl.scala:459)
>  at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$15(TaskSchedulerImpl.scala:459)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$15$adapted(TaskSchedulerImpl.scala:445)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:445)
>  at 
> org.apache.spark.scheduler.local.LocalEndpoint.reviveOffers(LocalSchedulerBackend.scala:88)
>  at 
> org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalSchedulerBackend.scala:65)
>  at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at 
> org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203) at 
> org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>  at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) 
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:834){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to