[
https://issues.apache.org/jira/browse/MAPREDUCE-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737459#action_12737459
]
Sharad Agarwal commented on MAPREDUCE-817:
------------------------------------------
We can add an API in JobClient say:
String getCompletedJobHistoryURL(jobId) throws IOException
In case job is not completed or history file not yet available in HDFS, it will
throw an exception with proper message.
There is a concern that history file name can't be inferred just from the job
id. Currently the file name consists of jobid, username, timestamp etc info,
which get used by history viewer UI and CLI tool. So for this API, the
jobtracker would have to cache the file name for a given jobid.
The related issue is of job retiring from memory. Currently job get retired
based on "mapred.jobtracker.retirejob.interval.min" and
"mapred.jobtracker.completeuserjobs.maximum". Since full job datastructures are
huge, it can't stay in memory for long. I propose that jobtracker knock out the
job from memory as soon as its history file is available in HDFS
(MAPREDUCE-814). Jobtracker keeps bare minimum completed job report (status,
#failedmaps, #failedreduces,..) in the order of few bytes in memory.
Assuming 100 bytes are stored for each completed job, 10,000 completed tiny job
reports in memory would take 1 MB.
> Add a JobClient API to get job history file url
> -----------------------------------------------
>
> Key: MAPREDUCE-817
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-817
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: client, jobtracker
> Reporter: Sharad Agarwal
> Assignee: Sharad Agarwal
>
> MAPREDUCE-814 will provide a way to keep the job history files in HDFS. There
> should be a way to get the url for the completed job history fie.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.