[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Sanjay Dahiya (JIRA) Tue, 15 Aug 2006 12:45:10 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-239?page=comments#action_12428201 ] 
            
Sanjay Dahiya commented on HADOOP-239:
--------------------------------------


For format of history file on disk - here is a proposal. 

Since we want the file to be flushed on every write, to survive jobtracker 
restarts. it makes sense to use a simple record oriented structure in file. 
Each log statement appends a record in the file. Since there can be multiple 
jobs running at any time the records can be intermixed in the log file ( unless 
we use one history file per job ). 
Using one history file per job is also a viable option in which case we can 
separate log files in different directories for different days and delete old 
files. 

In both cases following simple file format can be used to log history and 
parse/display in JSPs. 
<recordType> <key=value> <key=value> .... 

where recordType = {JobInfo, Task, MapAttempt, ReduceTask, ReduceAttempt .... }
and keys will depend on recordType e.g. for JobInfo keys = {jobId, jobName, 
submitTime, launchTime ... }

e.g. log while job start up may look like 

JobInfo jobId=job_001 jobName=wordCount submitTime=0001
JobInfo jobId=job_001 launchTime=0002
...
JobInfo jobId=job_001 finishTime=0002

We can provide a proxy class JobHistory, which exposes specific methods for 
logging different log events and takes care for formatting issues at a central 
place. 

comments ? 


> job tracker WI drops jobs after 24 hours
> ----------------------------------------
>
>                 Key: HADOOP-239
>                 URL: http://issues.apache.org/jira/browse/HADOOP-239
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Yoram Arnon
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>
> The jobtracker's WI, keeps track of jobs executed in the past 24 hours.
> if the cluster was idle for a day (say Sunday) it drops all its history.
> Monday morning, the page is empty.
> Better would be to store a fixed number of jobs (say 10 each of succeeded and 
> failed jobs).
> Also, if the job tracker is restarted, it loses all its history.
> The history should be persistent, withstanding restarts and upgrades.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-239) job tracker WI drops jobs after 24 hours

Reply via email to