[
https://issues.apache.org/jira/browse/SPARK-36827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418842#comment-17418842
]
Sean R. Owen commented on SPARK-36827:
--------------------------------------
Looks promising yeah. This loop seems like it could instead iterate once,
remembering which stages it has seen, and then use that info to clean up,
rather than iterating for each stage.
> Task/Stage/Job data remain in memory leads memory leak
> ------------------------------------------------------
>
> Key: SPARK-36827
> URL: https://issues.apache.org/jira/browse/SPARK-36827
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.1.2
> Reporter: Kohki Nishio
> Priority: Major
> Attachments: mem1.txt, worker.txt
>
>
> Noticing memory-leak like behavior, steady increase of heap after GC and
> eventually it leads to a service failure.
> The GC histogram shows very high number of Task/Data/Job data
> {code}
> num #instances #bytes class name
> ----------------------------------------------
> 6: 7835346 2444627952 org.apache.spark.status.TaskDataWrapper
> 25: 3765152 180727296 org.apache.spark.status.StageDataWrapper
> 88: 232255 9290200 org.apache.spark.status.JobDataWrapper
> {code}
> Thread dumps show clearly the clean up thread is always doing cleanupStages
> {code}
> "element-tracking-store-worker" #355 daemon prio=5 os_prio=0
> tid=0x00007f31b0014800 nid=0x409 runnable [0x00007f2f25783000]
> java.lang.Thread.State: RUNNABLE
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.util.kvstore.KVTypeInfo$MethodAccessor.get(KVTypeInfo.java:162)
> at
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.compare(InMemoryStore.java:434)
> at
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.lambda$iterator$0(InMemoryStore.java:375)
> at
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView$$Lambda$9000/574018760.compare(Unknown
> Source)
> at java.util.TimSort.gallopLeft(TimSort.java:542)
> at java.util.TimSort.mergeLo(TimSort.java:752)
> at java.util.TimSort.mergeAt(TimSort.java:514)
> at java.util.TimSort.mergeCollapse(TimSort.java:439)
> at java.util.TimSort.sort(TimSort.java:245)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1464)
> at
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.iterator(InMemoryStore.java:375)
> at
> org.apache.spark.util.kvstore.KVStoreView.closeableIterator(KVStoreView.java:117)
> at
> org.apache.spark.status.AppStatusListener.$anonfun$cleanupStages$2(AppStatusListener.scala:1269)
> at
> org.apache.spark.status.AppStatusListener$$Lambda$9126/608388595.apply(Unknown
> Source)
> at scala.collection.immutable.List.map(List.scala:297)
> at
> org.apache.spark.status.AppStatusListener.cleanupStages(AppStatusListener.scala:1260)
> at
> org.apache.spark.status.AppStatusListener.$anonfun$new$3(AppStatusListener.scala:98)
> at
> org.apache.spark.status.AppStatusListener$$Lambda$646/596139882.apply$mcVJ$sp(Unknown
> Source)
> at
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$3(ElementTrackingStore.scala:135)
> at
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$3$adapted(ElementTrackingStore.scala:133)
> at
> org.apache.spark.status.ElementTrackingStore$$Lambda$986/162337848.apply(Unknown
> Source)
> at scala.collection.immutable.List.foreach(List.scala:431)
> at
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$2(ElementTrackingStore.scala:133)
> at
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$2$adapted(ElementTrackingStore.scala:131)
> at
> org.apache.spark.status.ElementTrackingStore$$Lambda$984/600376389.apply(Unknown
> Source)
> at
> org.apache.spark.status.ElementTrackingStore$LatchedTriggers.$anonfun$fireOnce$1(ElementTrackingStore.scala:58)
> at
> org.apache.spark.status.ElementTrackingStore$LatchedTriggers$$Lambda$985/1187323214.apply$mcV$sp(Unknown
> Source)
> at
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at org.apache.spark.util.Utils$.tryLog(Utils.scala:2013)
> at
> org.apache.spark.status.ElementTrackingStore$$anon$1.run(ElementTrackingStore.scala:117)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]