[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868749#comment-15868749
 ] 

Weiwei Yang edited comment on MAPREDUCE-6847 at 2/16/17 3:32 AM:
-----------------------------------------------------------------

Hello [~jlowe]

Thanks for your comments, I appreciate that. What I wanted to resolve here is 
to let JHS be able to remove some out-of-dated jobs from cache. At present, JHS 
cache works like (for example it allows to cache 5 jobs or equivalent number of 
tasks)

# User clicks job1, job2 ... job5, JHS caches 5 jobs in memory
# JHS maintains all jobs in cache
# A long time passed
# Job1, 2 .. 5 are pretty out-of-dated, user clicks job6, JHS cache evicts a 
job but the cache still contains 5 jobs, 1 new and the other 4 old

This has no problem if the job size is small, but if jobs are large, e.g 100k 
tasks each, 5 jobs in cache will consume approximately more than 1.2 * 5 = 6G 
memory, is this really necessary? The patch was trying to simply expire some 
jobs in cache so let it caches more recent ones instead of those that have rare 
user access (small chance). Does that make sense to you?


was (Author: cheersyang):
Hello [~jlowe]

Thanks for your comments, I appreciate that. What I wanted to resolve here is 
to let JHS be able to remove some out-of-dated jobs from cache. At present, JHS 
cache works like (for example it allows to cache 5 jobs or equivalent number of 
tasks)

# User clicks job1, job2 ... job5, JHS caches 5 jobs in memory
# JHS maintains all jobs in cache
# A long time passed
# Job1, 2 .. 5 are pretty out-of-dated, user clicks job6, JHS cache evicts a 
job but the cache still contains 5 jobs, 1 new and the other 4 old

This has no problem if the job size is small, but if jobs are large, e.g 100k 
tasks each, 5 jobs in cache will consume approximately more than 1.2 * 5 = 6G 
memory, is this really necessary? The patch was trying to simply expire some 
jobs in cache so let it cache recent ones that would have user access (small 
chance). Does that make sense to you?

> Job history server should release jobs from cache after a fixed duration
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6847
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6847
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>         Attachments: MAPREDUCE-6847.01.patch
>
>
> We found history server is consuming a lot of memory when there are large 
> jobs (with more than 100k of tasks in a single job). Currently JHS cache only 
> evicts entries with size, it's better to add the time expiration as well to 
> reduce heap usage if job has no one accessing for sometime.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to