[ 
https://issues.apache.org/jira/browse/HADOOP-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678270#action_12678270
 ] 

Amar Kamat commented on HADOOP-4670:
------------------------------------

I had an offline discussion with Devaraj, Hemanth and Sharad. Seems like the 
following structure should solve this issue :
# old history files : path-to-job-history/
# history files for jobtracker on host hostname: path-to-job-history/hostname
# history files for user username using jobtracker running on hostname: 
path-to-job-history/hostname/username
# job history file format : <start-time>_<jobid>_<jobname>

Structuring it further on year, month and day might prove useful but for now it 
looks like a premature step. If needed we can add it later. So users who submit 
job at very high rate will be affected as compared to users that submit jobs 
less frequently. Searching will be easier per-user.

Future steps :
1) Add date level info in structuring or atleast display
2) Add indexing info for faster access/display
3) Provide various view like recent ones, sort by day/week/month/year, jobname 
(sorting and structuring) etc.
4) Secure access
5) Faster access and analysis (involves changes/tweaks to JobHistory and 
parsing).

Thoughts?

> Improve the way job history files are managed
> ---------------------------------------------
>
>                 Key: HADOOP-4670
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4670
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>
> Today all the jobhistory files are dumped in one _job-history_ folder. This 
> can cause problems when there is a need to search the history folder 
> (job-recovery etc). It would be nice if we group all the jobs under a _user_ 
> folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. 
> Jobs can be categorized using various features like _jobid, date, jobname_ 
> etc but using _username_ will make the search much more efficient and also 
> will not result into namespace explosion. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to