Prabhu Joseph created MAPREDUCE-6797:
----------------------------------------

             Summary: Improvement in the fix of Mapreduce-6684
                 Key: MAPREDUCE-6797
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: jobhistoryserver
    Affects Versions: 2.4.0, 2.8.0
            Reporter: Prabhu Joseph
            Priority: Critical


Description:

There is one more piece of code in HistoryFileManager where Synchronized 
keyword on HistoryFileInfo need to be removed. The JobHistoryServer contention 
issue is hit on our environment where stacktrace (attached) shows the 
HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock on 
HistoryFileInfo.

Synchronized on isMovePending and didMoveFail has been removed by 
Mapreduce-6684.

{code}
HistoryFileInfo firstValue = cache.get(key);
    synchronized(firstValue) {  ---------------> Synchronized is not needed here
              if (firstValue.isMovePending()) {
                if(firstValue.didMoveFail() && 
                    firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
                  cache.remove(key);
                  //Now lets try to delete it
                  try {
                    firstValue.delete();
                  } catch (IOException e) {
                    LOG.error("Error while trying to delete history files" +
                    " that could not be moved to done.", e);
                  }
                } else {
                  LOG.warn("Waiting to remove " + key
                      + " from JobListCache because it is not in done yet.");
                }
              } else {
                cache.remove(key);
              }
            }

{code}


{code}

Note: stacktrace is from hadoop-2.4.0 version and the problem exists in latest 
hadoop as well

"2144820863@qtp-313351300-38156" daemon prio=10 tid=0x0000000001e13800 
nid=0xf133 waiting for monitor entry [0x00007f7c1d8dd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226)
        - waiting to lock <0x000000040145c4d8> (a 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280)
        - locked <0x0000000400375388> (a 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920)
        at 
org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156)
        at 
org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Reply via email to