[ https://issues.apache.org/jira/browse/SPARK-26020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684200#comment-16684200 ]
Hidayat Teonadi commented on SPARK-26020: ----------------------------------------- linking related ticket SPARK-17233 > shuffle data from spark streaming not cleaned up when External Shuffle > Service is enabled > ----------------------------------------------------------------------------------------- > > Key: SPARK-26020 > URL: https://issues.apache.org/jira/browse/SPARK-26020 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core > Affects Versions: 2.3.0 > Reporter: Hidayat Teonadi > Priority: Major > > Hi, I'm running Spark Streaming on YARN and have enabled dynamic allocation + > External Spark Shuffle Service. I'm noticing that during the lifetime of my > spark streaming application, the nm appcache folder is building up with > blockmgr directories (filled with shuffle_*.data). > I understand why the data is not immediately cleaned up due to dynamic > executor allocation, but will any cleanup of these directories be done during > the lifetime of the spark streaming application ? Some of these shuffle data > are generated as part of spark jobs/stages that have already completed. > I've initially designed the application to run perpetually, but without any > cleanup eventually the cluster will run out of disk and crash the application. > [https://stackoverflow.com/questions/52923386/spark-streaming-job-doesnt-delete-shuffle-files] > suggests a stop gap solution of cleaning up via cron. > YARN-8991is the ticket I filed for YARN, who suggested me to file a ticket > for spark. Appreciate any help. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org