[ 
https://issues.apache.org/jira/browse/SPARK-26020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684200#comment-16684200
 ] 

Hidayat Teonadi commented on SPARK-26020:
-----------------------------------------

linking related ticket SPARK-17233

> shuffle data from spark streaming not cleaned up when External Shuffle 
> Service is enabled
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-26020
>                 URL: https://issues.apache.org/jira/browse/SPARK-26020
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, Spark Core
>    Affects Versions: 2.3.0
>            Reporter: Hidayat Teonadi
>            Priority: Major
>
> Hi, I'm running Spark Streaming on YARN and have enabled dynamic allocation + 
> External Spark Shuffle Service. I'm noticing that during the lifetime of my 
> spark streaming application, the nm appcache folder is building up with 
> blockmgr directories (filled with shuffle_*.data).
> I understand why the data is not immediately cleaned up due to dynamic 
> executor allocation, but will any cleanup of these directories be done during 
> the lifetime of the spark streaming application ? Some of these shuffle data 
> are generated as part of spark jobs/stages that have already completed.
> I've initially designed the application to run perpetually, but without any 
> cleanup eventually the cluster will run out of disk and crash the application.
> [https://stackoverflow.com/questions/52923386/spark-streaming-job-doesnt-delete-shuffle-files]
>  suggests a stop gap solution of cleaning up via cron.
> YARN-8991is the ticket I filed for YARN, who suggested me to file a ticket 
> for spark. Appreciate any help.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to