[ 
https://issues.apache.org/jira/browse/HBASE-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838382#comment-13838382
 ] 

Vinod Kumar Vavilapalli commented on HBASE-9485:
------------------------------------------------

bq. That's it?
Things have changed with MRv2 in a way.
 - In Hadoop 1, if the JobTracker goes down, the users were responsible for any 
cleanup of temporary data from before and resubmit jobs afresh. This also 
avoided multiple incarnations of a job to run at the same time.
 - With Hadoop 2, ResourceManager automatically restarts per job 
ApplicationMaster(AM) in case of node/cluster failures and also enables one to 
not lose old completed work from the previous incarnation of the jobs. So two 
things need to happen, promote outputs from previous incarnation and also avoid 
multiple ApplicationMasters of the same job don't conflict. We designed the 
recoverTask() API for that reason - the second AM invokes this API for every 
taskAttempt that succeeded - the implementation can chose to promote output 
from the previous AM in an implementation specific manner.

Seems like with this patch, all the old work that was already 'committed' into 
HBase is automatically retained and any redone work will automatically replace 
old outputs because of HBase put-idempotency.

It's this easy apparently because HBase OutputCommitter doesn't have a staging 
table to account for job failures. So, if a job fails half-way through, the 
table is 'corrupted' and users depend on external mechanisms to clean it up?


> TableOutputCommitter should implement recovery if we don't want jobs to start 
> from 0 on RM restart
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9485
>                 URL: https://issues.apache.org/jira/browse/HBASE-9485
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 9485-v2.txt
>
>
> HBase extends OutputCommitter which turns recovery off. Meaning all completed 
> maps are lost on RM restart and job starts from scratch. FileOutputCommitter 
> implements recovery so we should look at that to see what is potentially 
> needed for recovery.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to