[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124148#comment-15124148
 ] 

Karthik Kambatla commented on MAPREDUCE-6622:
---------------------------------------------

I have used the guava cache before (can't remember where), and have found it to 
be very useful. Except for the compatibility concerns in guava, I don't mind us 
using it on the server side at all. The client is a different story, but I 
don't think we need to worry about that here. [~jlowe], [~rchiang] - did you 
guys have any other specific concerns with using guava that I am discounting? 

bq. I ended up having to call cleanUp() in order to get the unit tests to pass, 
but those admittedly run in a very short amount of time.
I don't see the need to do cleanups to ensure the unit tests pass. 

On how often to clean up, the cache considers eviction on every load. If this 
is going to be a cache with frequent loads, we don't need to have another 
thread doing the cleanup. [~rchiang] - in your test, did you consider 
continuously loading new jobs? If no new jobs are loaded for a while, the 
likelihood of the cache cleaning up is low. Even if it does, it will be on 
read. But, do we need to evict jobs if we are not loading any new ones? 

> Add capability to set JHS job cache to a task-based limit
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-6622
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver
>    Affects Versions: 2.7.2
>            Reporter: Ray Chiang
>            Assignee: Ray Chiang
>              Labels: supportability
>         Attachments: MAPREDUCE-6622.001.patch
>
>
> When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs 
> can be of varying size.  This is generally not a problem when the jobs sizes 
> are uniform or small, but when the job sizes can be very large (say greater 
> than 250k tasks), then the JHS heap size can grow tremendously.
> In cases, where multiple jobs are very large, then the JHS can lock up and 
> spend all its time in GC.  However, since the cache is holding on to all the 
> jobs, not much heap space can be freed up.
> By setting a property that sets a cap on the number of tasks allowed in the 
> cache and since the total number of tasks loaded is directly proportional to 
> the amount of heap used, this should help prevent the JHS from locking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to