Xiaoqiao He created HDFS-15273:
----------------------------------
Summary: CacheReplicationMonitor hold lock for long time and lead
to NN out of service
Key: HDFS-15273
URL: https://issues.apache.org/jira/browse/HDFS-15273
Project: Hadoop HDFS
Issue Type: Improvement
Components: caching, namenode
Reporter: Xiaoqiao He
Assignee: Xiaoqiao He
CacheReplicationMonitor scan Cache Directives and Cached BlockMap periodically.
If we add more and more cache directives, CacheReplicationMonitor will cost
very long time to rescan all of cache directives and cache blocks. Meanwhile,
scan operation hold global write lock, during scan period, NameNode could not
process other request.
So I think we should warn this risk to end user who turn on CacheManager
feature before improve this implement.
{code:java}
private void rescan() throws InterruptedException {
scannedDirectives = 0;
scannedBlocks = 0;
try {
namesystem.writeLock();
try {
lock.lock();
if (shutdown) {
throw new InterruptedException("CacheReplicationMonitor was " +
"shut down.");
}
curScanCount = completedScanCount + 1;
} finally {
lock.unlock();
}
resetStatistics();
rescanCacheDirectives();
rescanCachedBlockMap();
blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime();
} finally {
namesystem.writeUnlock();
}
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]