[ 
https://issues.apache.org/jira/browse/TEZ-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108686#comment-17108686
 ] 

Jonathan Turner Eagles commented on TEZ-4129:
---------------------------------------------

Thanks for the patch. This have been very busy for me, but here are my thoughts.

Overall I see this patch followed very closely the DAG delete feature, which I 
think is a good start.

- We could make taskAttemptFailed part of the ContainerLauncher or perhaps 
DagContainerLauncher to make this slightly more common so that local container 
launcher can also benefit. (perhaps not much benefit though)
- I notice you made to design decision to make this for all failed tasks and 
not just retroactively failed tasks. Perhaps this is correct. 
- TezRuntimeUtils, taskIndentifier should be attemptIdentifier
- ShuffleHandler. taskattempt could just reuse map. Technically, ShuffleHandler 
doesn't know about attempts. This would allow for future improvements where 
multiple failures could send a single deletion request.
- I think it is not the best design that taskAttemptFailed takes a nodeId. 
Though I think it is a reasonable concession so attempt<-> node doesn't have to 
be stored and tracked separately
- I think it may be better to copy the DagDeleteRunnable action so that attempt 
deletion is done in a separate thread.
- It will be better to not use FileContext but instead FileSystem. Though I can 
see this was copied from DagDelete. It should not have done this though.
- One question I have left is the state of the shuffle handler. An attempt that 
is cleaned up could still be in the Index Cache. I wonder if that should be 
handled or not. This goes for DagComplete as well.

> Delete intermediate attempt data for failed attempts for Shuffle Handler
> ------------------------------------------------------------------------
>
>                 Key: TEZ-4129
>                 URL: https://issues.apache.org/jira/browse/TEZ-4129
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Jonathan Turner Eagles
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>              Labels: ShuffleHandler
>         Attachments: TEZ-4129.01.patch, TEZ-4129.02.patch, TEZ-4129.03.patch, 
> TEZ-4129.04.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to