[ https://issues.apache.org/jira/browse/SPARK-31496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091189#comment-17091189 ]
Hyukjin Kwon commented on SPARK-31496: -------------------------------------- Is this a regression? Sounds more like a question which should be best asked to mailing list. You could have a better answer there. > Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError > ------------------------------------------------------------------------ > > Key: SPARK-31496 > URL: https://issues.apache.org/jira/browse/SPARK-31496 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Environment: Windows 10 (1909) > JDK 11.0.6 > spark-3.0.0-preview2-bin-hadoop3.2 > local[1] > > > Reporter: Tomas Shestakov > Priority: Major > Labels: out-of-memory > > Local spark with one core (local[1]) while trying to save Dataset to parquet > local file cause OOM. > {code:java} > SparkSession sparkSession = SparkSession.builder() > .appName("Loader impl test") > .master("local[1]") > .config("spark.ui.enabled", false) > .config("spark.sql.datetime.java8API.enabled", true) > .config("spark.serializer", > "org.apache.spark.serializer.KryoSerializer") > .config("spark.kryoserializer.buffer.max", "1g") > .config("spark.executor.memory", "4g") > .config("spark.driver.memory", "8g") > .getOrCreate(); > {code} > {noformat} > [20-Apr-2020 11:42:27.877] INFO [boundedElastic-2 > o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output > committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter > [20-Apr-2020 11:42:27.877] INFO [boundedElastic-2 > o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output > committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter > [20-Apr-2020 11:42:27.967] INFO [boundedElastic-2 > o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - File Output > Committer Algorithm version is 1 > [20-Apr-2020 11:42:27.969] INFO [boundedElastic-2 > o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using user defined > output committer class org.apache.parquet.hadoop.ParquetOutputCommitter > [20-Apr-2020 11:42:27.970] INFO [boundedElastic-2 > o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - File Output > Committer Algorithm version is 1 > [20-Apr-2020 11:42:27.973] INFO [boundedElastic-2 > o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using output committer > class org.apache.parquet.hadoop.ParquetOutputCommitter > [20-Apr-2020 11:42:34.371] INFO [boundedElastic-2 > org.apache.spark.SparkContext:57] q: - Starting job: save at > LoaderImpl.java:305 > [20-Apr-2020 11:42:34.389] INFO [dag-scheduler-event-loop > org.apache.spark.scheduler.DAGScheduler:57] q: - Got job 0 (save at > LoaderImpl.java:305) with 1 output partitions > [20-Apr-2020 11:42:34.390] INFO [dag-scheduler-event-loop > org.apache.spark.scheduler.DAGScheduler:57] q: - Final stage: ResultStage 0 > (save at LoaderImpl.java:305) > [20-Apr-2020 11:42:34.390] INFO [dag-scheduler-event-loop > org.apache.spark.scheduler.DAGScheduler:57] q: - Parents of final stage: > List() > [20-Apr-2020 11:42:34.392] INFO [dag-scheduler-event-loop > org.apache.spark.scheduler.DAGScheduler:57] q: - Missing parents: > List()[20-Apr-2020 11:42:34.398] INFO [dag-scheduler-event-loop > org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting ResultStage 0 > (MapPartitionsRDD[6] at save at LoaderImpl.java:305), which has no missing > parents > [20-Apr-2020 11:42:34.634] INFO [dag-scheduler-event-loop > org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0 stored > as values in memory (estimated size 166.1 KiB, free 18.4 GiB) > [20-Apr-2020 11:42:34.945] INFO [dag-scheduler-event-loop > org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0_piece0 > stored as bytes in memory (estimated size 58.0 KiB, free 18.4 GiB) > [20-Apr-2020 11:42:34.949] INFO [dispatcher-BlockManagerMaster > org.apache.spark.storage.BlockManagerInfo:57] q: - Added broadcast_0_piece0 > in memory on DESKTOP-A1:58276 (size: 58.0 KiB, free: 18.4 GiB) > [20-Apr-2020 11:42:34.953] INFO [dag-scheduler-event-loop > org.apache.spark.SparkContext:57] q: - Created broadcast 0 from broadcast at > DAGScheduler.scala:1206 > [20-Apr-2020 11:42:34.980] INFO [dag-scheduler-event-loop > org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting 1 missing tasks > from ResultStage 0 (MapPartitionsRDD[6] at save at LoaderImpl.java:305) > (first 15 tasks are for partitions Vector(0)) > [20-Apr-2020 11:42:34.981] INFO [dag-scheduler-event-loop > org.apache.spark.scheduler.TaskSchedulerImpl:57] q: - Adding task set 0.0 > with 1 tasks > Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError at > java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125) > at > java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119) > at > java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95) > at > java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156) > at > org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41) > at > java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1859) > at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:712) > at org.apache.spark.util.Utils$$anon$2.write(Utils.scala:153) at > com.esotericsoftware.kryo.io.Output.flush(Output.java:185) at > com.esotericsoftware.kryo.io.Output.close(Output.java:196) at > org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:273) > at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:158) at > org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$writeObject$1(ParallelCollectionRDD.scala:65) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at > org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1343) at > org.apache.spark.rdd.ParallelCollectionPartition.writeObject(ParallelCollectionRDD.scala:51) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > java.base/java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1130) > at > java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1497) > at > java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) > at > java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) > at > java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553) > at > java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510) > at > java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) > at > java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) > at > java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101) > at > org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:428) > at scala.Option.map(Option.scala:163) at > org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:409) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:346) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at > org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:340) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$18(TaskSchedulerImpl.scala:464) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$18$adapted(TaskSchedulerImpl.scala:459) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$15(TaskSchedulerImpl.scala:459) > at > org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$15$adapted(TaskSchedulerImpl.scala:445) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at > org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:445) > at > org.apache.spark.scheduler.local.LocalEndpoint.reviveOffers(LocalSchedulerBackend.scala:88) > at > org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalSchedulerBackend.scala:65) > at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at > org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203) at > org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at > org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) > at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org