[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated MAPREDUCE-5003:
--------------------------------
    Attachment: MAPREDUCE-5003.9.patch

Thanks for review [~jlowe]! I have uploaded .9 patch to address the concerns 
you have. About the nit you mentioned,  when a previously running task gets 
recovered, its state will be null, that's why I do the null check. It's null 
because jobhistory server only record state for a task in those completion 
event. So recovery will not get value of state for those previously running 
tasks. .9 patch deals with this problem by doing record state for task in task 
start event. 
The way I check backward compatibility is by first check if an old jobhistory 
files could be parsed by my modified new jobhistory server. I do this by first 
start a single node cluster without applying my patch, and run some jobs. Then 
I shutdown the jobhistory server and apply my patch, compile the new code and 
start up the jobhistory server which will have my changes. I check if the new 
jobhistory server could load and parse those old jobhistory files. I verified 
that I can visit all old jobhistory in the UI after restart.
Also vice versa, I check if old history server could be compatible with 
jobhistory files generated by the jobhistory server with my change. I follow 
the steps above except I first run jobs in the new jobhistory server with my 
patch applied and then shutdown the history server and remove my patch, 
recompile the code and start up the jobhistory server without my patch on. I 
verify that those jobhistory files generated by the new jobhistory server could 
be parsed by the old jobhistory server.

> AM recovery should recreate records for attempts that were incomplete
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5003
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5003
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>            Reporter: Jason Lowe
>            Assignee: Chang Li
>         Attachments: MAPREDUCE-5003.1.patch, MAPREDUCE-5003.2.patch, 
> MAPREDUCE-5003.3.patch, MAPREDUCE-5003.4.patch, MAPREDUCE-5003.5.patch, 
> MAPREDUCE-5003.5.patch, MAPREDUCE-5003.6.patch, MAPREDUCE-5003.7.patch, 
> MAPREDUCE-5003.8.patch, MAPREDUCE-5003.9.patch
>
>
> As discussed in MAPREDUCE-4992, it would be nice if the AM recovered task 
> attempt entries for *all* task attempts launched by the prior app attempt 
> even if those task attempts did not complete.  The attempts would have to be 
> marked as killed or something similar to indicate it is no longer running.  
> Having records for the task attempts enables the user to see what nodes were 
> associated with the attempts and potentially access their logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to