[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111628#comment-16111628
 ] 

Haibo Chen commented on MAPREDUCE-6892:
---------------------------------------

bq. 3. Looking at the toTimelineEvent() methods, those are a bit messed up too. 
As I mentioned above, it has prop. keys like NUM_MAPS and FINISHED_MAPS, but 
both contain the same value. So either we rename NUM_MAPS to SUCCEEDED_MAPS and 
delete FINISHED_MAPS or NUM_MAPS should be succeeded+failed+killed and rename 
FINISHED_MAPS to SUCCEEDED_MAPS.
TimelineServiceV2 related changes (new fields and renames) are fine as long as 
we do it before 3.0 beta1, from which we'll need to consider compatibility. So 
let's also include the changes needed here for ATSv2.

bq. 1. I can add new fields to the Avro schema and keep the old ones, but it 
adds to the complexity, because we have to support both pairs
bq. 2. We probably also have to check whether "finished" or "succeeded" values 
are defined in the jhist file, but not both
we can keep the old names and only add new fields, so we continue to use 
'finished' as succeeded. In this case, there won't be a new field, succeeded.

A few more minor comments on the 2rd patch:
1) In JobHistoryEventHandler when we handles Job_KILL event, can we also add 
killedMappers + killedReducers in the summary as well?
2) there is one line above JobImpl.unsuccessfulFinish() which is unnecessary.
3) In JobImpl.InternalTerminationTransition, can we include all killed/failed 
counters instead of zeroes?
4) JobImpl#L1959 is more than 80 characters
5) In Job20LineHistoryEventEmitter, can we also parse the killed + failed 
counters instead of passing -1s?
6) CompletedJob: instead of returning 0s for the new counters, we can return 
jobInfo.get*() that we get from parsing .jhist files.
7) In TestJobHistoryParser.testHistoryParsingForKilledAndFailedAttempts(), 
verification of noOffailedAttempts seems unnecessary.
In fact, the total # depends on job configuration. Can we remove the related 
code?


> Issues with the count of failed/killed tasks in the jhist file
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-6892
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client, jobhistoryserver
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH
>
>
> Recently we encountered some issues with the value of failed tasks. After 
> parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually 
> there were failures. 
> Another minor thing is that you cannot get the number of killed tasks 
> (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the 
> successful map/reduce task counts. Number of failed (or killed) tasks are not 
> stored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to