[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158438#comment-15158438
 ] 

Karthik Kambatla commented on MAPREDUCE-6622:
---------------------------------------------

Comments on the latest patch:
# Unused variable {{cleanupThread}} left over from previous iterations.
# The description for loadedjobs.cache.size config says it is ignored if 
loadedtasks.cache.size is set. A default value of -1 actually means it is set. 
I like the previous approach of not setting it at all. 
# Also, when reading the value for loadedtasks.cache.size, don't we want to 
enforce a minimum value for it? May be, 1? 
# I am not sure I understand how entries are loaded into the cache. The 
CacheBuilder calls getFullJob, but the latter checks if the entry is present in 
the cache? If I understand it right, the public getFullJob() should just call 
{{loadedJobCache.get(jobID)}}. The CacheLoader should call a private method 
{{loadJob(jobID)}}. The current contents of {{getFullJob}} - get fileInfo etc. 
- should move to loadJob. 

> Add capability to set JHS job cache to a task-based limit
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-6622
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver
>    Affects Versions: 2.7.2
>            Reporter: Ray Chiang
>            Assignee: Ray Chiang
>              Labels: supportability
>         Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, 
> MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch
>
>
> When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs 
> can be of varying size.  This is generally not a problem when the jobs sizes 
> are uniform or small, but when the job sizes can be very large (say greater 
> than 250k tasks), then the JHS heap size can grow tremendously.
> In cases, where multiple jobs are very large, then the JHS can lock up and 
> spend all its time in GC.  However, since the cache is holding on to all the 
> jobs, not much heap space can be freed up.
> By setting a property that sets a cap on the number of tasks allowed in the 
> cache and since the total number of tasks loaded is directly proportional to 
> the amount of heap used, this should help prevent the JHS from locking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to