Xiaoqiao He created HDFS-15273:
----------------------------------

             Summary: CacheReplicationMonitor hold lock for long time and lead 
to NN out of service
                 Key: HDFS-15273
                 URL: https://issues.apache.org/jira/browse/HDFS-15273
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: caching, namenode
            Reporter: Xiaoqiao He
            Assignee: Xiaoqiao He


CacheReplicationMonitor scan Cache Directives and Cached BlockMap periodically. 
If we add more and more cache directives, CacheReplicationMonitor will cost 
very long time to rescan all of cache directives and cache blocks. Meanwhile, 
scan operation hold global write lock, during scan period, NameNode could not 
process other request.
So I think we should warn this risk to end user who turn on CacheManager 
feature before improve this implement.
{code:java}
  private void rescan() throws InterruptedException {
    scannedDirectives = 0;
    scannedBlocks = 0;
    try {
      namesystem.writeLock();
      try {
        lock.lock();
        if (shutdown) {
          throw new InterruptedException("CacheReplicationMonitor was " +
              "shut down.");
        }
        curScanCount = completedScanCount + 1;
      } finally {
        lock.unlock();
      }

      resetStatistics();
      rescanCacheDirectives();
      rescanCachedBlockMap();
      blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime();
    } finally {
      namesystem.writeUnlock();
    }
  }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to