[ 
https://issues.apache.org/jira/browse/HDFS-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080389#comment-14080389
 ] 

Yi Liu commented on HDFS-6784:
------------------------------

Thanks [~cmccabe]

{quote}
If a needsRescan arrives while a scan is currently in progress
{quote}

But I have checked in current implementation, all places which call 
setNeedsRescan are protected by FSN lock, so if a scan is currently in 
progress, there is no {{needsRescan}} arrives.

{quote}
I think the solution is to minimize the number of times we call setNeedsRescan 
from these functions.
{quote}
Agree this is one approach, I thought if we could resolve this from the root 
it's better.

> Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls 
> setNeedsRescan multiple times.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6784
>                 URL: https://issues.apache.org/jira/browse/HDFS-6784
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: caching
>    Affects Versions: 3.0.0
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>         Attachments: HDFS-6784.001.patch
>
>
> In HDFS CacheReplicationMonitor,  rescan is expensive. Sometimes, 
> {{setNeedsRescan}} is called multiple times, for example, in 
> FSNamesystem#modifyCacheDirective, there are 3 times. In monitor thread of 
> CacheReplicationMonitor, if it checks {{needsRescan}} is true, rescan will 
> happen, but {{needsRescan}} is set to false before real scan. Meanwhile, the 
> 2nd or 3rd time {{setNeedsResacn}} may set {{needsRescan}} to true. So after 
> the scan finish, in next loop, a new rescan will be triggered, that's not 
> necessary at all and inefficient for rescan twice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to