[jira] [Commented] (HDFS-13904) ContentSummary does not always respect processing limit, resulting in long lock acquisitions

Uma Maheswara Rao G (Jira) Tue, 05 May 2020 17:56:36 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17100366#comment-17100366
 ]


Uma Maheswara Rao G commented on HDFS-13904:
--------------------------------------------

Thanks [~xkrogen] for the reply. 
BTW, Just for clarity. Not sure removing the below conditions fix anything here.
{quote}
dir.getReadHoldCount() != 1 ||
        fsn.getReadHoldCount() != 1
{quote}
It looks like it is just ensuring the same thread does not have multiple times 
taken read lock. That would not happen as mostly we might have taken readlocks 
once in the flow. The getReadHoldCount will return number of read holds on 
current thread. 
So, if other read ops running at same time should not change the count here. 
So, fsn.getReadHoldCount() should return 1.
There is another method to get all readlocks count, that is getReadLockCount. 
This will return total read locks. Anyway we are not using this.
Am I missing something from your point? In the same ContentSummary flow if we 
are taking read lock somewhere 2nd time in some conditions, then the above 
check may impact. I am not sure that the case. Since the failing of that sanity 
check is something not normal, may worth to add some logs there?



> ContentSummary does not always respect processing limit, resulting in long 
> lock acquisitions
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13904
>                 URL: https://issues.apache.org/jira/browse/HDFS-13904
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, namenode
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
>
> HDFS-4995 added a config {{dfs.content-summary.limit}} which allows for an 
> administrator to set a limit on the number of entries processed during a 
> single acquisition of the {{FSNamesystemLock}} during the creation of a 
> content summary. This is useful to prevent very long (multiple seconds) 
> pauses on the NameNode when {{getContentSummary}} is called on large 
> directories.
> However, even on versions with HDFS-4995, we have seen warnings like:
> {code}
> INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem read 
> lock held for 9398 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:950)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:188)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1486)
> org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.yield(ContentSummaryComputationContext.java:109)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:679)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeContentSummary(INodeDirectory.java:642)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:656)
> {code}
> happen quite consistently when {{getContentSummary}} was called on a large 
> directory on a heavily-loaded NameNode. Such long pauses completely destroy 
> the performance of the NameNode. We have the limit set to its default of 
> 5000; if it was respected, clearly there would not be a 10-second pause.
> The current {{yield()}} code within {{ContentSummaryComputationContext}} 
> looks like:
> {code}
>   public boolean yield() {
>     // Are we set up to do this?
>     if (limitPerRun <= 0 || dir == null || fsn == null) {
>       return false;
>     }
>     // Have we reached the limit?
>     long currentCount = counts.getFileCount() +
>         counts.getSymlinkCount() +
>         counts.getDirectoryCount() +
>         counts.getSnapshotableDirectoryCount();
>     if (currentCount <= nextCountLimit) {
>       return false;
>     }
>     // Update the next limit
>     nextCountLimit = currentCount + limitPerRun;
>     boolean hadDirReadLock = dir.hasReadLock();
>     boolean hadDirWriteLock = dir.hasWriteLock();
>     boolean hadFsnReadLock = fsn.hasReadLock();
>     boolean hadFsnWriteLock = fsn.hasWriteLock();
>     // sanity check.
>     if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
>         hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
>         fsn.getReadHoldCount() != 1) {
>       // cannot relinquish
>       return false;
>     }
>     // unlock
>     dir.readUnlock();
>     fsn.readUnlock("contentSummary");
>     try {
>       Thread.sleep(sleepMilliSec, sleepNanoSec);
>     } catch (InterruptedException ie) {
>     } finally {
>       // reacquire
>       fsn.readLock();
>       dir.readLock();
>     }
>     yieldCount++;
>     return true;
>   }
> {code}
> We believe that this check in particular is the culprit:
> {code}
>     if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
>         hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
>         fsn.getReadHoldCount() != 1) {
>       // cannot relinquish
>       return false;
>     }
> {code}
> The content summary computation will only relinquish the lock if it is 
> currently the _only_ holder of the lock. Given the high volume of read 
> requests on a heavily loaded NameNode, especially when unfair locking is 
> enabled, it is likely there may be another holder of the read lock performing 
> some short-lived operation. By refusing to give up the lock in this case, the 
> content summary computation ends up never relinquishing the lock.
> We propose to simply remove the readHoldCount checks from this {{yield()}}. 
> This should alleviate the case described above by giving up the read lock and 
> allowing other short-lived operations to complete (while the content summary 
> thread sleeps) so that the lock can finally be given up completely. This has 
> the drawback that sometimes, the content summary may give up the lock 
> unnecessarily, if the read lock is never actually released by the time the 
> thread continues again. The only negative impact from this is to make some 
> large content summary operations slightly slower, with the tradeoff of 
> reducing NameNode-wide performance impact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13904) ContentSummary does not always respect processing limit, resulting in long lock acquisitions

Reply via email to