[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
-------------------------------------------

    Release Note: Added config option 
mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker.  It is the 
target percentage of the local distributed cache that should be kept in between 
garbage collection runs.  In practice it will delete unused distributed cache 
entries in LRU order until the size of the cache is less than 
mapreduce.tasktracker.cache.local.keep.pct of the maximum cache size.  This is 
a floating point value between 0.0 and 1.0.  The default is 0.95.  (was: Added 
config option mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker.  
It is the minimum percentage of the local distributed cache that should be kept 
in between garbage collection runs.  This is a floating point value between 0.0 
and 1.0.  The default is 0.95.)

> Make the distributed cache delete entires using LRU priority
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2494
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.20.205.0, 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2494-20.20X-V1.patch, 
> MAPREDUCE-2494-20.20X-V3.patch, MAPREDUCE-2494-V1.patch, 
> MAPREDUCE-2494-V2.patch
>
>
> Currently the distributed cache will wait until a cache directory is above a 
> preconfigured threshold.  At which point it will delete all entries that are 
> not currently being used.  It seems like we would get far fewer cache misses 
> if we kept some of them around, even when they are not being used.  We should 
> add in a configurable percentage for a goal of how much of the cache should 
> remain clear when not in use, and select objects to delete based off of how 
> recently they were used, and possibly also how large they are/how difficult 
> is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to