[
https://issues.apache.org/jira/browse/MAPREDUCE-6603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112497#comment-15112497
]
Kuhu Shukla commented on MAPREDUCE-6603:
----------------------------------------
At Task level, {{selectBestAttempt()}} decides which is the best attempt, and
returns null if all the attempts have failed. The null return is checked during
counters, progress and report update to eliminate a failed task/attempts. To
address the issue in this JIRA, in case there are no successful attempts we can
set a flag and return a counters object with all values set to 0. What this
would mean is, if we go to task counters we will see all 0 counters instead of
'Sorry it looks like task_1_2_r_3 has no counters. ' and when you click on
individual counters it takes you to the attempt counters which hold actual
values. The pros are we open up a way see failed attempts without causing task
level counters to be incorrect. What was null before is now an all zero
counter. The same effect would then cascade to job level counters being all
zeros for a fully failed job. The cons are we depart from the design that
failed( and killed) tasks should not be considered and it may interfere with
other services reading it, which needs validation after this change, for
example JHS, curl queries etc. I would appreciate any comments, suggestions and
insights for this change. Thanks a lot!
> Add counters for failed task attempts
> -------------------------------------
>
> Key: MAPREDUCE-6603
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6603
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 2.7.1, 2.6.3
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Priority: Minor
>
> The counters for failed task attempts are currently unavailable and would be
> nice to have for troubleshooting whilst not including them in the aggregate
> counters at task or job level. One should be able to view them at attempt
> level.
> {code}
> Sorry it looks like task_1_2_r_3 has no counters.
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)