[ https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556500#action_12556500 ]
Alejandro Abdelnur commented on HADOOP-1876: -------------------------------------------- Yes, you can do all that, but that involves many more changes. The JobHistory writes all job info to a LOG file. This has the following issues: * the write/read methods of the RunningJob elements cannot be leveraged, a special writing/parsing has to be done for each (it would have to be done for counters). * the log file would have to be traversed for every job info not found in memory, with thousands of jobs in there this would certainly slow down to a crawl. * by moving the JobHistory LOG to DFS, then appending becomes an issue. > Persisting completed jobs status > -------------------------------- > > Key: HADOOP-1876 > URL: https://issues.apache.org/jira/browse/HADOOP-1876 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Environment: all > Reporter: Alejandro Abdelnur > Priority: Critical > Fix For: 0.16.0 > > Attachments: patch1876.txt, patch1876.txt > > > Currently the JobTracker keeps information about completed jobs in memory. > This information is flushed from the cache when it has outlived > (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has > been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). > Also, if the JobTracker is restarted (due to being recycled or due to a > crash) information about completed jobs is lost. > If any of the above scenarios happens before the job information is queried > by a hadoop client (normally the job submitter or a monitoring component) > there is no way to obtain such information. > A way to avoid this is the JobTracker to persist in DFS the completed jobs > information upon job completion. This would be done at the time the job is > moved to the completed jobs queue. Then when querying the JobTracker for > information about a completed job, if it is not found in the memory queue, a > lookup in DFS would be done to retrieve the completed job information. > A directory in DFS (under mapred/system) would be used to persist completed > job information, for each completed job there would be a directory with the > job ID, within that directory all the information about the job: status, > jobprofile, counters and completion events. > A configuration property will indicate for how log persisted job information > should be kept in DFS. After such period it will be cleaned up automatically. > This improvement would not introduce API changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.