[ 
https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138389#comment-14138389
 ] 

Saisai Shao commented on SPARK-3563:
------------------------------------

I think it relies on JVM's GC strategy to treat reference, maybe data like 
ShuffleDependecy's life is long enough to move to old generation, so only full 
gc will trigger the cleaning operation. Somehow it proves that actually shuffle 
data will be cleaned, maybe there's no potential bugs about haunted reference.

> Shuffle data not always be cleaned
> ----------------------------------
>
>                 Key: SPARK-3563
>                 URL: https://issues.apache.org/jira/browse/SPARK-3563
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.0.2
>            Reporter: shenhong
>
> In our cluster, when we run a spark streaming job, after running for many 
> hours, the shuffle data seems not all be cleaned, here is the shuffle data:
> -rw-r----- 1 tdwadmin users 23948 Sep 17 13:21 shuffle_132_34_0
> -rw-r----- 1 tdwadmin users 18237 Sep 17 13:32 shuffle_143_22_1
> -rw-r----- 1 tdwadmin users 22934 Sep 17 13:35 shuffle_146_15_0
> -rw-r----- 1 tdwadmin users 27666 Sep 17 13:35 shuffle_146_36_1
> -rw-r----- 1 tdwadmin users 12864 Sep 17 14:05 shuffle_176_12_0
> -rw-r----- 1 tdwadmin users 22115 Sep 17 14:05 shuffle_176_33_1
> -rw-r----- 1 tdwadmin users 15666 Sep 17 14:21 shuffle_192_0_1
> -rw-r----- 1 tdwadmin users 13916 Sep 17 14:38 shuffle_209_53_0
> -rw-r----- 1 tdwadmin users 20031 Sep 17 14:41 shuffle_212_26_0
> -rw-r----- 1 tdwadmin users 15158 Sep 17 14:41 shuffle_212_47_1
> -rw-r----- 1 tdwadmin users 42880 Sep 17 12:12 shuffle_63_1_1
> -rw-r----- 1 tdwadmin users 32030 Sep 17 12:14 shuffle_65_40_0
> -rw-r----- 1 tdwadmin users 34477 Sep 17 12:33 shuffle_84_2_1
> The shuffle data of stage 63, 65, 84, 132... are not cleaned.
> In ContextCleaner, it maintains a weak reference for each RDD, 
> ShuffleDependency, and Broadcast of interest,  to be processed when the 
> associated object goes out of scope of the application. Actual  cleanup is 
> performed in a separate daemon thread. 
> There must be some  reference for ShuffleDependency , and it's hard to find 
> out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to