[ 
https://issues.apache.org/jira/browse/TEZ-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327352#comment-16327352
 ] 

Jason Lowe commented on TEZ-3877:
---------------------------------

As I understand it, index files are only intended for files that will be 
shuffled, as the index allows the shuffle handler to locate the specific 
partition being requested during the shuffle transfer.  I do not believe we can 
delete shuffle files since the request can arrive far after the task completes. 
 It can even be re-requested if the downstream task fails and restarts another 
attempt.

> Delete unordered spill files once merge is done
> -----------------------------------------------
>
>                 Key: TEZ-3877
>                 URL: https://issues.apache.org/jira/browse/TEZ-3877
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Jason Lowe
>            Priority: Major
>         Attachments: TEZ-3877.001.patch
>
>
>   I see that spill files are not deleted right after merge completes. We 
> should do that as it takes up a lot of space and we can't afford that wastage 
> when Tez takes up a lot of shuffle space with complex DAGs. [~jlowe] told me 
> they are only cleaned up after application completes as they are written in 
> app directory and not container directory. That also has to be done so that 
> they are cleaned up by node manager during task failures or container crashes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to