[ 
https://issues.apache.org/jira/browse/SPARK-36827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418842#comment-17418842
 ] 

Sean R. Owen commented on SPARK-36827:
--------------------------------------

Looks promising yeah. This loop seems like it could instead iterate once, 
remembering which stages it has seen, and then use that info to clean up, 
rather than iterating for each stage.

> Task/Stage/Job data remain in memory leads memory leak
> ------------------------------------------------------
>
>                 Key: SPARK-36827
>                 URL: https://issues.apache.org/jira/browse/SPARK-36827
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.1.2
>            Reporter: Kohki Nishio
>            Priority: Major
>         Attachments: mem1.txt, worker.txt
>
>
> Noticing memory-leak like behavior, steady increase of heap after GC and 
> eventually it leads to a service failure. 
> The GC histogram shows very high number of Task/Data/Job data
> {code}
>  num     #instances         #bytes  class name 
> ---------------------------------------------- 
>    6:       7835346     2444627952  org.apache.spark.status.TaskDataWrapper 
>   25:       3765152      180727296  org.apache.spark.status.StageDataWrapper 
>   88:        232255        9290200  org.apache.spark.status.JobDataWrapper 
> {code}
> Thread dumps show clearly the clean up thread is always doing cleanupStages
> {code}
> "element-tracking-store-worker" #355 daemon prio=5 os_prio=0 
> tid=0x00007f31b0014800 nid=0x409 runnable [0x00007f2f25783000]
>    java.lang.Thread.State: RUNNABLE
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.util.kvstore.KVTypeInfo$MethodAccessor.get(KVTypeInfo.java:162)
>       at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.compare(InMemoryStore.java:434)
>       at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.lambda$iterator$0(InMemoryStore.java:375)
>       at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView$$Lambda$9000/574018760.compare(Unknown
>  Source)
>       at java.util.TimSort.gallopLeft(TimSort.java:542)
>       at java.util.TimSort.mergeLo(TimSort.java:752)
>       at java.util.TimSort.mergeAt(TimSort.java:514)
>       at java.util.TimSort.mergeCollapse(TimSort.java:439)
>       at java.util.TimSort.sort(TimSort.java:245)
>       at java.util.Arrays.sort(Arrays.java:1512)
>       at java.util.ArrayList.sort(ArrayList.java:1464)
>       at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.iterator(InMemoryStore.java:375)
>       at 
> org.apache.spark.util.kvstore.KVStoreView.closeableIterator(KVStoreView.java:117)
>       at 
> org.apache.spark.status.AppStatusListener.$anonfun$cleanupStages$2(AppStatusListener.scala:1269)
>       at 
> org.apache.spark.status.AppStatusListener$$Lambda$9126/608388595.apply(Unknown
>  Source)
>       at scala.collection.immutable.List.map(List.scala:297)
>       at 
> org.apache.spark.status.AppStatusListener.cleanupStages(AppStatusListener.scala:1260)
>       at 
> org.apache.spark.status.AppStatusListener.$anonfun$new$3(AppStatusListener.scala:98)
>       at 
> org.apache.spark.status.AppStatusListener$$Lambda$646/596139882.apply$mcVJ$sp(Unknown
>  Source)
>       at 
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$3(ElementTrackingStore.scala:135)
>       at 
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$3$adapted(ElementTrackingStore.scala:133)
>       at 
> org.apache.spark.status.ElementTrackingStore$$Lambda$986/162337848.apply(Unknown
>  Source)
>       at scala.collection.immutable.List.foreach(List.scala:431)
>       at 
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$2(ElementTrackingStore.scala:133)
>       at 
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$2$adapted(ElementTrackingStore.scala:131)
>       at 
> org.apache.spark.status.ElementTrackingStore$$Lambda$984/600376389.apply(Unknown
>  Source)
>       at 
> org.apache.spark.status.ElementTrackingStore$LatchedTriggers.$anonfun$fireOnce$1(ElementTrackingStore.scala:58)
>       at 
> org.apache.spark.status.ElementTrackingStore$LatchedTriggers$$Lambda$985/1187323214.apply$mcV$sp(Unknown
>  Source)
>       at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>       at org.apache.spark.util.Utils$.tryLog(Utils.scala:2013)
>       at 
> org.apache.spark.status.ElementTrackingStore$$anon$1.run(ElementTrackingStore.scala:117)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to