Tomas Shestakov created SPARK-31496:
---------------------------------------
Summary: Exception in thread "dispatcher-event-loop-1"
java.lang.OutOfMemoryError
Key: SPARK-31496
URL: https://issues.apache.org/jira/browse/SPARK-31496
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.0.0
Environment: Windows 10 (1909)
JDK 11.0.6
spark-3.0.0-preview2-bin-hadoop3.2
local[1]
Reporter: Tomas Shestakov
Local spark with one core (local[1]) while trying to save Dataset to parquet
local file cause OOM
{noformat}
[20-Apr-2020 11:42:27.877] INFO [boundedElastic-2
o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output
committer for Parquet:
org.apache.parquet.hadoop.ParquetOutputCommitter[20-Apr-2020 11:42:27.877]
INFO [boundedElastic-2 o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: -
Using default output committer for Parquet:
org.apache.parquet.hadoop.ParquetOutputCommitter[20-Apr-2020 11:42:27.967]
INFO [boundedElastic-2 o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: -
File Output Committer Algorithm version is 1[20-Apr-2020 11:42:27.969] INFO
[boundedElastic-2 o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using
user defined output committer class
org.apache.parquet.hadoop.ParquetOutputCommitter[20-Apr-2020 11:42:27.970]
INFO [boundedElastic-2 o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: -
File Output Committer Algorithm version is 1[20-Apr-2020 11:42:27.973] INFO
[boundedElastic-2 o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using
output committer class
org.apache.parquet.hadoop.ParquetOutputCommitter[20-Apr-2020 11:42:34.371]
INFO [boundedElastic-2 org.apache.spark.SparkContext:57] q: - Starting job:
save at LoaderImpl.java:305[20-Apr-2020 11:42:34.389] INFO
[dag-scheduler-event-loop org.apache.spark.scheduler.DAGScheduler:57] q: - Got
job 0 (save at LoaderImpl.java:305) with 1 output partitions[20-Apr-2020
11:42:34.390] INFO [dag-scheduler-event-loop
org.apache.spark.scheduler.DAGScheduler:57] q: - Final stage: ResultStage 0
(save at LoaderImpl.java:305)[20-Apr-2020 11:42:34.390] INFO
[dag-scheduler-event-loop org.apache.spark.scheduler.DAGScheduler:57] q: -
Parents of final stage: List()[20-Apr-2020 11:42:34.392] INFO
[dag-scheduler-event-loop org.apache.spark.scheduler.DAGScheduler:57] q: -
Missing parents: List()[20-Apr-2020 11:42:34.398] INFO
[dag-scheduler-event-loop org.apache.spark.scheduler.DAGScheduler:57] q: -
Submitting ResultStage 0 (MapPartitionsRDD[6] at save at LoaderImpl.java:305),
which has no missing parents[20-Apr-2020 11:42:34.634] INFO
[dag-scheduler-event-loop org.apache.spark.storage.memory.MemoryStore:57] q: -
Block broadcast_0 stored as values in memory (estimated size 166.1 KiB, free
18.4 GiB)[20-Apr-2020 11:42:34.945] INFO [dag-scheduler-event-loop
org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0_piece0
stored as bytes in memory (estimated size 58.0 KiB, free 18.4 GiB)[20-Apr-2020
11:42:34.949] INFO [dispatcher-BlockManagerMaster
org.apache.spark.storage.BlockManagerInfo:57] q: - Added broadcast_0_piece0 in
memory on DESKTOP-A1:58276 (size: 58.0 KiB, free: 18.4 GiB)[20-Apr-2020
11:42:34.953] INFO [dag-scheduler-event-loop org.apache.spark.SparkContext:57]
q: - Created broadcast 0 from broadcast at DAGScheduler.scala:1206[20-Apr-2020
11:42:34.980] INFO [dag-scheduler-event-loop
org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting 1 missing tasks
from ResultStage 0 (MapPartitionsRDD[6] at save at LoaderImpl.java:305) (first
15 tasks are for partitions Vector(0))[20-Apr-2020 11:42:34.981] INFO
[dag-scheduler-event-loop org.apache.spark.scheduler.TaskSchedulerImpl:57] q: -
Adding task set 0.0 with 1 tasksException in thread "dispatcher-event-loop-1"
java.lang.OutOfMemoryError at
java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125)
at
java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119) at
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
at
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
at
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at
java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1859)
at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:712) at
org.apache.spark.util.Utils$$anon$2.write(Utils.scala:153) at
com.esotericsoftware.kryo.io.Output.flush(Output.java:185) at
com.esotericsoftware.kryo.io.Output.close(Output.java:196) at
org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:273)
at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:158) at
org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$writeObject$1(ParallelCollectionRDD.scala:65)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1343) at
org.apache.spark.rdd.ParallelCollectionPartition.writeObject(ParallelCollectionRDD.scala:51)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method) at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566) at
java.base/java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1130)
at
java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1497)
at
java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
at
java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179)
at
java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553)
at
java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510)
at
java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
at
java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179)
at
java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at
org.apache.spark.scheduler.TaskSetManager.$anonfun$resourceOffer$2(TaskSetManager.scala:428)
at scala.Option.map(Option.scala:163) at
org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:409)
at
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOfferSingleTaskSet$1(TaskSchedulerImpl.scala:346)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at
org.apache.spark.scheduler.TaskSchedulerImpl.resourceOfferSingleTaskSet(TaskSchedulerImpl.scala:340)
at
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$18(TaskSchedulerImpl.scala:464)
at
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$18$adapted(TaskSchedulerImpl.scala:459)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$15(TaskSchedulerImpl.scala:459)
at
org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$resourceOffers$15$adapted(TaskSchedulerImpl.scala:445)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at
org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:445)
at
org.apache.spark.scheduler.local.LocalEndpoint.reviveOffers(LocalSchedulerBackend.scala:88)
at
org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyOrElse(LocalSchedulerBackend.scala:65)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at
org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203) at
org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834){noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]