[jira] Commented: (HADOOP-3245) Provide ability to persist running jobs (extend HADOOP-1876)

Amar Kamat (JIRA) Sun, 07 Sep 2008 23:31:42 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629080#action_12629080
 ]


Amar Kamat commented on HADOOP-3245:
------------------------------------

One comment on the patch. 

_Approach :_

The way history renaming is done in this patch is as follows
   - given the job-id, job-name and the user-name, try to find out a file from 
the history folder that matches the pattern : 
jt-hostname_[0-9]*_jobid_jobname_username
   - if any file matches the pattern, say file _f_, then use _f.recover_ as the 
new file for history.  If the file _f.recover_ is recovered, then rename 
_f.recover_ to _f_ and use _f.recover_ as the new file for history.
   - On successful recovery, delete _f_
   - On job completion, rename _f.recover_ to _f_.
   - If the jt restarts in between, use the older file as the file for recovery.

_Problem :_

With trunk, only 1 dfs access is made while starting the log process for a job. 
With this patch there will be 4 dfs accesses 
   - Check if the job has a history _file_ _[ false for new jobs]_
   - Check if _file_ exists _[false for new jobs]_
   - Check if _file.recovery_ exists _[false for new jobs]_
   - Open _file_ for logging

I think it makes more sense to create a new job file upon every restart. Before 
starting the recovery process,  delete all the history files related to the job 
except the oldest file. Note that the history filename has timestamp in it so 
that detecting the oldest file will now easy.

_Example :_
Say that the job started with the timestamp t1. The job history filename would 
be _hostname_t1_jobid_jobname_username_. Upon restart, delete all the file 
related to job except the oldest file. Now new filename would be 
_hostname_t2_jobid_jobname_username_. Use _hostname_t1_jobid_jobname_username_ 
as the source for recovery. If the jobtracker dies while recovering then there 
will be 2 history file for the job, delete _hostname_t2_jobid_jobname_username_ 
upon recovery and use _hostname_t1_jobid_jobname_username_ for recovery. If the 
recovery is successful, delete _hostname_t1_jobid_jobname_username_ just to 
make sure that the latest history file will be used upon next restart. There is 
no renaming and no temp file involved in this approach. 

Note that at a given time there will be at the max 2 history files per job.

> Provide ability to persist running jobs (extend HADOOP-1876)
> ------------------------------------------------------------
>
>                 Key: HADOOP-3245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3245
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3245-v2.5.patch, HADOOP-3245-v2.6.5.patch, 
> HADOOP-3245-v2.6.9.patch, HADOOP-3245-v4.1.patch, HADOOP-3245-v5.13.patch, 
> HADOOP-3245-v5.14.patch, HADOOP-3245-v5.26.patch, 
> HADOOP-3245-v5.30-nolog.patch, HADOOP-3245-v5.31.3-nolog.patch, 
> HADOOP-3245-v5.33.1.patch, HADOOP-3245-v5.35.3-no-log.patch
>
>
> This could probably extend the work done in HADOOP-1876. This feature can be 
> applied for things like jobs being able to survive jobtracker restarts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3245) Provide ability to persist running jobs (extend HADOOP-1876)

Reply via email to