[jira] Commented: (MAPREDUCE-817) Add a JobClient API to get job history file url

Sharad Agarwal (JIRA) Fri, 31 Jul 2009 01:12:43 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737459#action_12737459
 ]


Sharad Agarwal commented on MAPREDUCE-817:
------------------------------------------

We can add an API in JobClient say:
String getCompletedJobHistoryURL(jobId) throws IOException
In case job is not completed or history file not yet available in HDFS, it will 
throw an exception with proper message.

There is a concern that history file name can't be inferred just from the job 
id. Currently the file name consists of jobid, username, timestamp etc info, 
which get used by history viewer UI and CLI tool. So for this API, the 
jobtracker would have to cache the file name for a given jobid.

The related issue is of job retiring from memory. Currently job get retired 
based on "mapred.jobtracker.retirejob.interval.min" and 
"mapred.jobtracker.completeuserjobs.maximum". Since full job datastructures are 
huge, it can't stay in memory for long. I propose that jobtracker knock out the 
job from memory as soon as its history file is available in HDFS 
(MAPREDUCE-814).  Jobtracker keeps bare minimum completed job report (status, 
#failedmaps, #failedreduces,..) in the order of few bytes in memory.
Assuming 100 bytes are stored for each completed job, 10,000 completed tiny job 
reports in memory would take 1 MB.


> Add a JobClient API to get job history file url
> -----------------------------------------------
>
>                 Key: MAPREDUCE-817
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-817
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: client, jobtracker
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>
> MAPREDUCE-814 will provide a way to keep the job history files in HDFS. There 
> should be a way to get the url for the completed job history fie.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-817) Add a JobClient API to get job history file url

Reply via email to