Tomas Shestakov created SPARK-31496:
---------------------------------------

             Summary: Exception in thread "dispatcher-event-loop-1" 
java.lang.OutOfMemoryError
                 Key: SPARK-31496
                 URL: https://issues.apache.org/jira/browse/SPARK-31496
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.0
         Environment: Windows 10 (1909)

JDK 11.0.6

spark-3.0.0-preview2-bin-hadoop3.2

local[1]

 

 
            Reporter: Tomas Shestakov


Local spark with one core (local[1]) while trying to save Dataset to parquet 
local file cause OOM
{noformat}
[20-Apr-2020 11:42:27.877]  INFO [boundedElastic-2 
o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output 
committer for Parquet: 
org.apache.parquet.hadoop.ParquetOutputCommitter[20-Apr-2020 11:42:27.877]  
INFO [boundedElastic-2 o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - 
Using default output committer for Parquet: 
org.apache.parquet.hadoop.ParquetOutputCommitter[20-Apr-2020 11:42:27.967]  
INFO [boundedElastic-2 o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - 
File Output Committer Algorithm version is 1[20-Apr-2020 11:42:27.969]  INFO 
[boundedElastic-2 o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using 
user defined output committer class 
org.apache.parquet.hadoop.ParquetOutputCommitter[20-Apr-2020 11:42:27.970]  
INFO [boundedElastic-2 o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - 
File Output Committer Algorithm version is 1[20-Apr-2020 11:42:27.973]  INFO 
[boundedElastic-2 o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using 
output committer class 
org.apache.parquet.hadoop.ParquetOutputCommitter[20-Apr-2020 11:42:34.371]  
INFO [boundedElastic-2 org.apache.spark.SparkContext:57] q: - Starting job: 
save at LoaderImpl.java:305[20-Apr-2020 11:42:34.389]  INFO 
[dag-scheduler-event-loop org.apache.spark.scheduler.DAGScheduler:57] q: - Got 
job 0 (save at LoaderImpl.java:305) with 1 output partitions[20-Apr-2020 
11:42:34.390]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Final stage: ResultStage 0 
(save at LoaderImpl.java:305)[20-Apr-2020 11:42:34.390]  INFO 
[dag-scheduler-event-loop org.apache.spark.scheduler.DAGScheduler:57] q: - 
Parents of final stage: List()[20-Apr-2020 11:42:34.392]  INFO 
[dag-scheduler-event-loop org.apache.spark.scheduler.DAGScheduler:57] q: - 
Missing parents: List()[20-Apr-2020 11:42:34.398]  INFO 
[dag-scheduler-event-loop org.apache.spark.scheduler.DAGScheduler:57] q: - 
Submitting ResultStage 0 (MapPartitionsRDD[6] at save at LoaderImpl.java:305), 
which has no missing parents[20-Apr-2020 11:42:34.634]  INFO 
[dag-scheduler-event-loop org.apache.spark.storage.memory.MemoryStore:57] q: - 
Block broadcast_0 stored as values in memory (estimated size 166.1 KiB, free 
18.4 GiB)[20-Apr-2020 11:42:34.945]  INFO [dag-scheduler-event-loop 
org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0_piece0 
stored as bytes in memory (estimated size 58.0 KiB, free 18.4 GiB)[20-Apr-2020 
11:42:34.949]  INFO [dispatcher-BlockManagerMaster 
org.apache.spark.storage.BlockManagerInfo:57] q: - Added broadcast_0_piece0 in 
memory on DESKTOP-A1:58276 (size: 58.0 KiB, free: 18.4 GiB)[20-Apr-2020 
11:42:34.953]  INFO [dag-scheduler-event-loop org.apache.spark.SparkContext:57] 
q: - Created broadcast 0 from broadcast at DAGScheduler.scala:1206[20-Apr-2020 
11:42:34.980]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting 1 missing tasks 
from ResultStage 0 (MapPartitionsRDD[6] at save at LoaderImpl.java:305) (first 
15 tasks are for partitions Vector(0))[20-Apr-2020 11:42:34.981]  INFO 
[dag-scheduler-event-loop org.apache.spark.scheduler.TaskSchedulerImpl:57] q: - 
Adding task set 0.0 with 1 tasksException in thread "dispatcher-event-loop-1" 
java.lang.OutOfMemoryError at 
java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125)
 at 
java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119) at 
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
 at 
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156) 
at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
 at 
java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1859)
 at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:712) at 
org.apache.spark.util.Utils$$anon$2.write(Utils.scala:153) at 
com.esotericsoftware.kryo.io.Output.flush(Output.java:185) at 
com.esotericsoftware.kryo.io.Output.close(Output.java:196) at 
org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:273)
 at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:158) at 
org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$writeObject$1(ParallelCollectionRDD.scala:65)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1343) at 
org.apache.spark.rdd.ParallelCollectionPartition.writeObject(ParallelCollectionRDD.scala:51)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
java.base/java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1130)
 at 
java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1497)
 at 
java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
 at 
java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) 
at 
java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553)
 at 
java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510)
 at 
java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
 at 
java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) 
at 
java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) 
at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
 at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
 at 
org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:428)
 at scala.Option.map(Option.scala:163) at 
org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:409)
 at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:346)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at 
org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:340)
 at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$18(TaskSchedulerImpl.scala:464)
 at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$18$adapted(TaskSchedulerImpl.scala:459)
 at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) 
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$15(TaskSchedulerImpl.scala:459)
 at 
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$15$adapted(TaskSchedulerImpl.scala:445)
 at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:445)
 at 
org.apache.spark.scheduler.local.LocalEndpoint.reviveOffers(LocalSchedulerBackend.scala:88)
 at 
org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalSchedulerBackend.scala:65)
 at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at 
org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203) at 
org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
 at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 at java.base/java.lang.Thread.run(Thread.java:834){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to