[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463143#comment-13463143
 ] 

Sandy Ryza commented on MAPREDUCE-4680:
---------------------------------------

I just looked at the code again, and I think I misunderstood the first time, so 
I wanted to make sure we're on the same page.  Currently, all the yyyy/mm/dd 
directories are gathered, then sorted in ascending order by time.  Then we go 
through and delete files until we reach a young enough directory, then halt.  I 
had thought that job history files inside dd/ directories that were too young 
were being examined, but they are not.

The load on HDFS could be decreased further by, say, if the max age is 2 years, 
and it's 2012, not looking at anything deeper in the 2011 dir (and same for 
months).  But would this be worthwhile?  It would make a difference only if the 
max history age were greater than a month (default is a week), in which case it 
could save a listStatus for each month of age.

If not, I could still make it delete the old folders.
                
> Job history cleaner should only check timestamps of files in old enough 
> directories
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4680
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4680
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 2.0.0-alpha
>            Reporter: Sandy Ryza
>
> Job history files are stored in yyyy/mm/dd folders.  Currently, the job 
> history cleaner checks the modification date of each file in every one of 
> these folders to see whether it's past the maximum age.  The load on HDFS 
> could be reduced by only checking the ages of files in directories that are 
> old enough, as determined by their name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to