[ 
https://issues.apache.org/jira/browse/HADOOP-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648565#action_12648565
 ] 

Amar Kamat commented on HADOOP-4670:
------------------------------------

Doug, the search is w.r.t job-recovery. The type of search we do there is given 
a _jobtracker-hostname, job-id, username and job-name_ search the job-history 
file. The way we do it now is 
- construct a regex using _jobtracker-hostname, job-id, username and job-name_
- construct a path filter that accepts files that match the pattern and reject 
otherwise
- use the dfs listing api to find out files matching the pattern

This is a costly operation as all the files are scanned linearly. Over time the 
history folder can grow big leading to more search time. The only problem is 
all the users will be hit with this. With the above mentioned optimization we 
can reduce the search time for most of the users.

> Improve the way job history files are managed
> ---------------------------------------------
>
>                 Key: HADOOP-4670
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4670
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.20.0
>
>
> Today all the jobhistory files are dumped in one _job-history_ folder. This 
> can cause problems when there is a need to search the history folder 
> (job-recovery etc). It would be nice if we group all the jobs under a _user_ 
> folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. 
> Jobs can be categorized using various features like _jobid, date, jobname_ 
> etc but using _username_ will make the search much more efficient and also 
> will not result into namespace explosion. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to