[
https://issues.apache.org/jira/browse/SPARK-37781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466669#comment-17466669
]
Hyukjin Kwon commented on SPARK-37781:
--------------------------------------
It's just out-of-memory on driver side because there are too many rows to bring
to driver side. increase spark.driver.memory configuration
> Java Out-Of-Memory Error when retrieving value from dataframe
> -------------------------------------------------------------
>
> Key: SPARK-37781
> URL: https://issues.apache.org/jira/browse/SPARK-37781
> Project: Spark
> Issue Type: Question
> Components: Java API, Spark Submit, SQL
> Affects Versions: 3.1.2
> Reporter: Thinh Nguyen
> Priority: Major
>
> My submitted spark application keeps running into the following error:
> {code:java}
> Exception in thread "RemoteBlock-temp-file-clean-thread"
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$Lambda$751/0x0000000840662040.get$Lambda(Unknown
> Source)
> at
> java.base/java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder)
> at
> java.base/java.lang.invoke.Invokers$Holder.linkToTargetMethod(Invokers$Holder)
> at
> org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:2036)
> at
> org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$2.run(BlockManager.scala:2002)
> Exception in thread "main" java.lang.reflect.InvocationTargetException
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
> at
> org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at
> scala.collection.immutable.HashSet$HashTrieSet.updated0(HashSet.scala:551)
> at scala.collection.immutable.HashSet.$plus(HashSet.scala:84)
> at scala.collection.immutable.HashSet.$plus(HashSet.scala:35)
> at scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28)
> at scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24)
> at
> scala.collection.generic.Growable.$anonfun$$plus$plus$eq$1(Growable.scala:62)
> at
> scala.collection.generic.Growable$$Lambda$9/0x0000000840063840.apply(Unknown
> Source)
> at
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> at
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
> at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
> at
> scala.collection.mutable.SetBuilder.$plus$plus$eq(SetBuilder.scala:24)
> at scala.collection.TraversableLike.to(TraversableLike.scala:678)
> at scala.collection.TraversableLike.to$(TraversableLike.scala:675)
> at scala.collection.AbstractTraversable.to(Traversable.scala:108)
> at scala.collection.TraversableOnce.toSet(TraversableOnce.scala:309)
> at scala.collection.TraversableOnce.toSet$(TraversableOnce.scala:309)
> at scala.collection.AbstractTraversable.toSet(Traversable.scala:108)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.containsChild$lzycompute(TreeNode.scala:122)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.containsChild(TreeNode.scala:122)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$1(TreeNode.scala:270)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$withNewChildren$4(TreeNode.scala:283)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$2239/0x0000000840e8c040.apply(Unknown
> Source)
> at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
> at
> scala.collection.TraversableLike$$Lambda$17/0x000000084012e840.apply(Unknown
> Source)
> at
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> at
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at scala.collection.TraversableLike.map(TraversableLike.scala:238)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
> 12-29-2021 12:13:28 PM ERROR Utils: uncaught error in thread Spark Context
> Cleaner, stopping SparkContext
> java.lang.OutOfMemoryError: Java heap space
> 12-29-2021 12:13:28 PM ERROR Utils: throw uncaught fatal error in thread
> Spark Context Cleaner
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "Spark Context Cleaner" java.lang.OutOfMemoryError: Java
> heap space{code}
>
> A dataframe is created from a JDBC query to a Postgres database
>
> {code:java}
> var dataframeVariable = sparkSession.read
> .format("jdbc")
> .option("url", urlVariable)
> .option("driver", driverVariable)
> .option("user", usernameVariable)
> .option("password", passwordVariable)
> .option("query", "select max(timestamp) as
> timestamp from \"" + tableNameVariable + "\"")
> .load()
> {code}
>
> The error occurs when the program tries to extract a value from the
> dataframe. The dataframe contains only a single row and column. Here are the
> methods that I have used but have resulted in the application hanging and
> eventually getting the OOM error.
> {code:java}
> var lastTimestamp = dataframeVariable().getDouble(0){code}
> {code:java}
> var timeStampVal = dataframeVariable(col("timestamp")).collect(){code}
>
> After some looking around, several people suggested changing the spark
> configurations for memory management to address this issues but I am not sure
> where to start in regards to that. Any guidance would be helpful.
>
> *Currently using:* Spark 3.1.2, Scala 2.12, Java 11
> *Spark Cluster Spec:* 8 workers, 48 cores, 64GB Memory
> *Application Submitted Spec:* 1 worker, 4 driver and executor cores, 4GB
> driver and executor memory
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]