[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158975#comment-15158975
 ] 

Jason Lowe commented on MAPREDUCE-6622:
---------------------------------------

Sorry for the delay in responding, as I was out on vacation.

bq.  Jason Lowe, Ray Chiang - did you guys have any other specific concerns 
with using guava that I am discounting?

My main concern was that we were adding a separate thread and extra overhead 
for something that didn't seem worth the trouble.  Now that the cache code has 
been significantly cleaned up, I'm more comfortable with going the guava route.

bq. Jason Lowe - do you remember if the number of jobs was being used as a 
proxy for the memory usage?

Yes, it was essentially a proxy for capping memory usage.  It's also a 
performance tunable, since caching more jobs can lead to improved performance 
depending upon client access patterns.  However I think it is OK to override 
the old value with the new value as the behavior is clearly documented and 
we're not providing any default value that would override the user's old 
settings implicitly when they upgrade.  Users would have to go out of their way 
to set this value and in doing so should encounter the documentation explaining 
the semantics when they set it.


> Add capability to set JHS job cache to a task-based limit
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-6622
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver
>    Affects Versions: 2.7.2
>            Reporter: Ray Chiang
>            Assignee: Ray Chiang
>              Labels: supportability
>         Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, 
> MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch
>
>
> When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs 
> can be of varying size.  This is generally not a problem when the jobs sizes 
> are uniform or small, but when the job sizes can be very large (say greater 
> than 250k tasks), then the JHS heap size can grow tremendously.
> In cases, where multiple jobs are very large, then the JHS can lock up and 
> spend all its time in GC.  However, since the cache is holding on to all the 
> jobs, not much heap space can be freed up.
> By setting a property that sets a cap on the number of tasks allowed in the 
> cache and since the total number of tasks loaded is directly proportional to 
> the amount of heap used, this should help prevent the JHS from locking up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to