[ https://issues.apache.org/jira/browse/HDFS-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080389#comment-14080389 ]
Yi Liu commented on HDFS-6784: ------------------------------ Thanks [~cmccabe] {quote} If a needsRescan arrives while a scan is currently in progress {quote} But I have checked in current implementation, all places which call setNeedsRescan are protected by FSN lock, so if a scan is currently in progress, there is no {{needsRescan}} arrives. {quote} I think the solution is to minimize the number of times we call setNeedsRescan from these functions. {quote} Agree this is one approach, I thought if we could resolve this from the root it's better. > Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls > setNeedsRescan multiple times. > ----------------------------------------------------------------------------------------------------------- > > Key: HDFS-6784 > URL: https://issues.apache.org/jira/browse/HDFS-6784 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching > Affects Versions: 3.0.0 > Reporter: Yi Liu > Assignee: Yi Liu > Attachments: HDFS-6784.001.patch > > > In HDFS CacheReplicationMonitor, rescan is expensive. Sometimes, > {{setNeedsRescan}} is called multiple times, for example, in > FSNamesystem#modifyCacheDirective, there are 3 times. In monitor thread of > CacheReplicationMonitor, if it checks {{needsRescan}} is true, rescan will > happen, but {{needsRescan}} is set to false before real scan. Meanwhile, the > 2nd or 3rd time {{setNeedsResacn}} may set {{needsRescan}} to true. So after > the scan finish, in next loop, a new rescan will be triggered, that's not > necessary at all and inefficient for rescan twice. -- This message was sent by Atlassian JIRA (v6.2#6252)