[
https://issues.apache.org/jira/browse/TEZ-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327352#comment-16327352
]
Jason Lowe commented on TEZ-3877:
---------------------------------
As I understand it, index files are only intended for files that will be
shuffled, as the index allows the shuffle handler to locate the specific
partition being requested during the shuffle transfer. I do not believe we can
delete shuffle files since the request can arrive far after the task completes.
It can even be re-requested if the downstream task fails and restarts another
attempt.
> Delete unordered spill files once merge is done
> -----------------------------------------------
>
> Key: TEZ-3877
> URL: https://issues.apache.org/jira/browse/TEZ-3877
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rohini Palaniswamy
> Assignee: Jason Lowe
> Priority: Major
> Attachments: TEZ-3877.001.patch
>
>
> I see that spill files are not deleted right after merge completes. We
> should do that as it takes up a lot of space and we can't afford that wastage
> when Tez takes up a lot of shuffle space with complex DAGs. [~jlowe] told me
> they are only cleaned up after application completes as they are written in
> app directory and not container directory. That also has to be done so that
> they are cleaned up by node manager during task failures or container crashes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)