[
https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780003#comment-17780003
]
Srinivasu Majeti commented on HDFS-15273:
-----------------------------------------
CCing [~weichiu] to review this and approve.
> CacheReplicationMonitor hold lock for long time and lead to NN out of service
> -----------------------------------------------------------------------------
>
> Key: HDFS-15273
> URL: https://issues.apache.org/jira/browse/HDFS-15273
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: caching, namenode
> Reporter: Xiaoqiao He
> Assignee: Xiaoqiao He
> Priority: Major
> Attachments: HDFS-15273.001.patch, HDFS-15273.002.patch,
> HDFS-15273.003.patch
>
>
> CacheReplicationMonitor scan Cache Directives and Cached BlockMap
> periodically. If we add more and more cache directives,
> CacheReplicationMonitor will cost very long time to rescan all of cache
> directives and cache blocks. Meanwhile, scan operation hold global write
> lock, during scan period, NameNode could not process other request.
> So I think we should warn this risk to end user who turn on CacheManager
> feature before improve this implement.
> {code:java}
> private void rescan() throws InterruptedException {
> scannedDirectives = 0;
> scannedBlocks = 0;
> try {
> namesystem.writeLock();
> try {
> lock.lock();
> if (shutdown) {
> throw new InterruptedException("CacheReplicationMonitor was " +
> "shut down.");
> }
> curScanCount = completedScanCount + 1;
> } finally {
> lock.unlock();
> }
> resetStatistics();
> rescanCacheDirectives();
> rescanCachedBlockMap();
> blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime();
> } finally {
> namesystem.writeUnlock();
> }
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]