[
https://issues.apache.org/jira/browse/TEZ-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108686#comment-17108686
]
Jonathan Turner Eagles commented on TEZ-4129:
---------------------------------------------
Thanks for the patch. This have been very busy for me, but here are my thoughts.
Overall I see this patch followed very closely the DAG delete feature, which I
think is a good start.
- We could make taskAttemptFailed part of the ContainerLauncher or perhaps
DagContainerLauncher to make this slightly more common so that local container
launcher can also benefit. (perhaps not much benefit though)
- I notice you made to design decision to make this for all failed tasks and
not just retroactively failed tasks. Perhaps this is correct.
- TezRuntimeUtils, taskIndentifier should be attemptIdentifier
- ShuffleHandler. taskattempt could just reuse map. Technically, ShuffleHandler
doesn't know about attempts. This would allow for future improvements where
multiple failures could send a single deletion request.
- I think it is not the best design that taskAttemptFailed takes a nodeId.
Though I think it is a reasonable concession so attempt<-> node doesn't have to
be stored and tracked separately
- I think it may be better to copy the DagDeleteRunnable action so that attempt
deletion is done in a separate thread.
- It will be better to not use FileContext but instead FileSystem. Though I can
see this was copied from DagDelete. It should not have done this though.
- One question I have left is the state of the shuffle handler. An attempt that
is cleaned up could still be in the Index Cache. I wonder if that should be
handled or not. This goes for DagComplete as well.
> Delete intermediate attempt data for failed attempts for Shuffle Handler
> ------------------------------------------------------------------------
>
> Key: TEZ-4129
> URL: https://issues.apache.org/jira/browse/TEZ-4129
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jonathan Turner Eagles
> Assignee: Syed Shameerur Rahman
> Priority: Major
> Labels: ShuffleHandler
> Attachments: TEZ-4129.01.patch, TEZ-4129.02.patch, TEZ-4129.03.patch,
> TEZ-4129.04.patch
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)