[jira] [Commented] (HIVE-6638) Hive needs to implement recovery for Application Master restart

Ashutosh Chauhan (JIRA) Wed, 12 Mar 2014 11:41:33 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932152#comment-13932152
 ]


Ashutosh Chauhan commented on HIVE-6638:
----------------------------------------

{color:blue}
h3. How MR AM restart works?
{color}
If MR AM crashes, RM restarts AM (by default number of retrials = 2). Before 
another AM restarts job, it tries to recover work already done by previous AM. 
This is done in conjunction with help from OutputFormat. FileOutputFormat e.g., 
writes to a temp dir which uses AM attempt id to construct its destination 
path. When AM comes up, it has a new id. As a part of recovery, completed work 
of previous AM is moved to new paths which has attempt id of new AM. Than tasks 
which didn’t complete in previous attempt are restarted and begins to write 
into new paths which contains new AM attempt id. 

{color:blue}
h3. What Hive needs to do to adapt to AM restart?
{color}
For Hive to support recovery for AM restart case, following changes are 
required:

a) Hive always set HiveOutputFormatImpl as its OutputFormat. In this OF, we 
will implement newly introduced isRecoverySupported() and recoverTask() methods.
b) FileSink operator is passed in specPath where it does all its writing. It 
first writes to a tmp location alongside it and at the end of task promotes 
output to specPath location. This tmp location within specPath needs to be 
modified to take into account AM attempt id.
c) in recoverTask() method of OF, all the committed output of previous AM needs 
to be moved under new JobAttemptId. This is very similar to 
FileOutputCommitter::recoverTask.
d) In commitJob(), we need to promote from AM attempt id dir to dir under 
specPath.

*Note* testing needs to be done for following areas for this feature
* Insertion into Dynamic partition
* Insertion into bucketed table
* Secure setup


> Hive needs to implement recovery for Application Master restart 
> ----------------------------------------------------------------
>
>                 Key: HIVE-6638
>                 URL: https://issues.apache.org/jira/browse/HIVE-6638
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.11.0, 0.12.0, 0.13.0
>            Reporter: Ashutosh Chauhan
>
> Currently, if AM restarts, whole job is restarted. Although, job and 
> subsequently query would still finish to completion, it would be nice if Hive 
> don't need to redo all the work done under previous AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6638) Hive needs to implement recovery for Application Master restart

Reply via email to