[
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055461#comment-15055461
]
zhihai xu commented on MAPREDUCE-6436:
--------------------------------------
Thanks for updating the patch [~lewuathe]! the new patch looks good except the
checkstyle issue.
{code}
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:265:
Line is longer than 80 characters (found 97).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:267:
Line is longer than 80 characters (found 102).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:268:
Line is longer than 80 characters (found 118).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:271:
Line is longer than 80 characters (found 94).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:272:
Line is longer than 80 characters (found 114).
{code}
Could you fix the above checkstyle issue?
> JobHistory cache issue
> ----------------------
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Ryu Kobayashi
> Assignee: Kai Sasaki
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch,
> MAPREDUCE-6436.3.patch, stacktrace1.txt, stacktrace2.txt, stacktrace3.txt
>
>
> Problem:
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 50000 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 20000,
> HistoryFileManager.addIfAbsent
> method produces 50000 - 20000 = 30000 lines of "Waiting to remove <key> from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution:
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
> scanning if another thread is already scanning. This changes semantics of
> some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
> because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls
> are
> not blocked by a loop at scale of tens of thousands.
>
> This patch implemented the first item.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)