ZiyueGuan created SPARK-35570:
---------------------------------

             Summary: Shuffle file leak with external shuffle service enable
                 Key: SPARK-35570
                 URL: https://issues.apache.org/jira/browse/SPARK-35570
             Project: Spark
          Issue Type: Bug
          Components: Block Manager, Shuffle
    Affects Versions: 3.1.2
            Reporter: ZiyueGuan


Unlike rdd block, external shuffle service doesn't offer a cleaning up of 
shuffle file. The cleaning up of shuffle file mainly rely on alive executors to 
response the request from context cleaner. As long as the executor exit, the 
shuffle file left will not be cleaned until application exits. For streaming 
application or long running application, disk may run out. 

I'm confused that shuffle file was left like above while the lifecycle of rdd 
block was properly handled. Is there any difference between them? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to