[
https://issues.apache.org/jira/browse/MAPREDUCE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chang Li updated MAPREDUCE-5003:
--------------------------------
Attachment: MAPREDUCE-5003.9.patch
Thanks for review [~jlowe]! I have uploaded .9 patch to address the concerns
you have. About the nit you mentioned, when a previously running task gets
recovered, its state will be null, that's why I do the null check. It's null
because jobhistory server only record state for a task in those completion
event. So recovery will not get value of state for those previously running
tasks. .9 patch deals with this problem by doing record state for task in task
start event.
The way I check backward compatibility is by first check if an old jobhistory
files could be parsed by my modified new jobhistory server. I do this by first
start a single node cluster without applying my patch, and run some jobs. Then
I shutdown the jobhistory server and apply my patch, compile the new code and
start up the jobhistory server which will have my changes. I check if the new
jobhistory server could load and parse those old jobhistory files. I verified
that I can visit all old jobhistory in the UI after restart.
Also vice versa, I check if old history server could be compatible with
jobhistory files generated by the jobhistory server with my change. I follow
the steps above except I first run jobs in the new jobhistory server with my
patch applied and then shutdown the history server and remove my patch,
recompile the code and start up the jobhistory server without my patch on. I
verify that those jobhistory files generated by the new jobhistory server could
be parsed by the old jobhistory server.
> AM recovery should recreate records for attempts that were incomplete
> ---------------------------------------------------------------------
>
> Key: MAPREDUCE-5003
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5003
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mr-am
> Reporter: Jason Lowe
> Assignee: Chang Li
> Attachments: MAPREDUCE-5003.1.patch, MAPREDUCE-5003.2.patch,
> MAPREDUCE-5003.3.patch, MAPREDUCE-5003.4.patch, MAPREDUCE-5003.5.patch,
> MAPREDUCE-5003.5.patch, MAPREDUCE-5003.6.patch, MAPREDUCE-5003.7.patch,
> MAPREDUCE-5003.8.patch, MAPREDUCE-5003.9.patch
>
>
> As discussed in MAPREDUCE-4992, it would be nice if the AM recovered task
> attempt entries for *all* task attempts launched by the prior app attempt
> even if those task attempts did not complete. The attempts would have to be
> marked as killed or something similar to indicate it is no longer running.
> Having records for the task attempts enables the user to see what nodes were
> associated with the attempts and potentially access their logs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)