[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072988#comment-13072988
 ] 

Robert Joseph Evans commented on MAPREDUCE-2494:
------------------------------------------------

I really would prefer to have this in 205.  I originally wanted to put it into 
204.  I have a customer that has really been pushing for this to help them 
bring down the time on some of their jobs so that they have a comfortable 
headroom on their SLAs.  Also we have looked at distributed cache usage in some 
of our grids and there are a handful of blocks that get read so often that each 
one consistently accounts for over 1 TB of network traffic each day.  Every day 
4000 nodes read each of these blocks and then throw them away. If we can cut 
that out because those blocks are still in the cache it is a huge win for us.

What more testing do you want before you consider this acceptable?

> Make the distributed cache delete entires using LRU priority
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2494
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.20.205.0, 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2494-20.20X-V1.patch, 
> MAPREDUCE-2494-20.20X-V3.patch, MAPREDUCE-2494-V1.patch, 
> MAPREDUCE-2494-V2.patch
>
>
> Currently the distributed cache will wait until a cache directory is above a 
> preconfigured threshold.  At which point it will delete all entries that are 
> not currently being used.  It seems like we would get far fewer cache misses 
> if we kept some of them around, even when they are not being used.  We should 
> add in a configurable percentage for a goal of how much of the cache should 
> remain clear when not in use, and select objects to delete based off of how 
> recently they were used, and possibly also how large they are/how difficult 
> is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to