[jira] [Comment Edited] (HDFS-13904) ContentSummary does not always respect processing limit, resulting in long lock acquisitions

Erik Krogen (JIRA) Fri, 07 Sep 2018 15:05:21 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607707#comment-16607707
 ]


Erik Krogen edited comment on HDFS-13904 at 9/7/18 9:57 PM:
------------------------------------------------------------

Thanks [~arpitagarwal] and [~kihwal] for taking a look!
{quote}
Do you have snapshots? We don't.
{quote}
No snapshots!

{quote}
I wonder what is actually causing the long lock in your case.
...
We have long getContentSummary calls made periodically and they can take close 
to a minute. But we do not hit this kind of long locking problems.
{quote}
The directory in question had around 100M entries. But, you make good points. 
To describe more fully the situation under which we observed this behavior:
* There was a job periodically calling {{getContentSummary}} on this huge 
directory. No long lock warnings were seen.
* The NameNode was restarted.
* Now, consistently, every {{contentSummary}} call on that directory caused the 
lock hold warning.
* The NameNode was restarted again.
* The long lock holds were no longer seen.

So I agree that it is an odd case, and probably only triggered under certain 
circumstances...

{quote}The change may help if the concurrent long reads are all 
getContentSummary calls.{quote}
Are there other long read operations besides {{getContentSummary}}? 
{{listStatus}} is already bounded by an even smaller limit.


I'll see if I can replicate this more deterministically to test the efficacy of 
the patch, perhaps with Dynamometer...


was (Author: xkrogen):
Thanks [~arpitagarwal] and [~kihwal] for taking a look!
{quote}
Do you have snapshots? We don't.
{quote}
No snapshots!

{quote}
I wonder what is actually causing the long lock in your case.
...
We have long getContentSummary calls made periodically and they can take close 
to a minute. But we do not hit this kind of long locking problems.
{quote}
The directory in question had around 100M entries. But, you make good points. 
To describe more fully the situation under which we observed this behavior:
* There was a job periodically calling {{getContentSummary}} on this huge 
directory. No long lock warnings were seen.
* The NameNode was restarted.
* Now, consistently, every {{contentSummary}} call on that directory caused the 
lock hold warning.
* The NameNode was restarted again.
* The long lock holds were no longer seen.

So I agree that it is an odd case, and probably only triggered under certain 
circumstances...

{quote}The change may help if the concurrent long reads are all 
getContentSummary calls.{quote}
Are there other long read operations besides {{getContentSummary}}? 
{{listStatus}} is already bounded by an even smaller limit.

> ContentSummary does not always respect processing limit, resulting in long 
> lock acquisitions
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13904
>                 URL: https://issues.apache.org/jira/browse/HDFS-13904
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, namenode
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
>
> HDFS-4995 added a config {{dfs.content-summary.limit}} which allows for an 
> administrator to set a limit on the number of entries processed during a 
> single acquisition of the {{FSNamesystemLock}} during the creation of a 
> content summary. This is useful to prevent very long (multiple seconds) 
> pauses on the NameNode when {{getContentSummary}} is called on large 
> directories.
> However, even on versions with HDFS-4995, we have seen warnings like:
> {code}
> INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem read 
> lock held for 9398 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:950)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:188)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1486)
> org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.yield(ContentSummaryComputationContext.java:109)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:679)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeContentSummary(INodeDirectory.java:642)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:656)
> {code}
> happen quite consistently when {{getContentSummary}} was called on a large 
> directory on a heavily-loaded NameNode. Such long pauses completely destroy 
> the performance of the NameNode. We have the limit set to its default of 
> 5000; if it was respected, clearly there would not be a 10-second pause.
> The current {{yield()}} code within {{ContentSummaryComputationContext}} 
> looks like:
> {code}
>   public boolean yield() {
>     // Are we set up to do this?
>     if (limitPerRun <= 0 || dir == null || fsn == null) {
>       return false;
>     }
>     // Have we reached the limit?
>     long currentCount = counts.getFileCount() +
>         counts.getSymlinkCount() +
>         counts.getDirectoryCount() +
>         counts.getSnapshotableDirectoryCount();
>     if (currentCount <= nextCountLimit) {
>       return false;
>     }
>     // Update the next limit
>     nextCountLimit = currentCount + limitPerRun;
>     boolean hadDirReadLock = dir.hasReadLock();
>     boolean hadDirWriteLock = dir.hasWriteLock();
>     boolean hadFsnReadLock = fsn.hasReadLock();
>     boolean hadFsnWriteLock = fsn.hasWriteLock();
>     // sanity check.
>     if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
>         hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
>         fsn.getReadHoldCount() != 1) {
>       // cannot relinquish
>       return false;
>     }
>     // unlock
>     dir.readUnlock();
>     fsn.readUnlock("contentSummary");
>     try {
>       Thread.sleep(sleepMilliSec, sleepNanoSec);
>     } catch (InterruptedException ie) {
>     } finally {
>       // reacquire
>       fsn.readLock();
>       dir.readLock();
>     }
>     yieldCount++;
>     return true;
>   }
> {code}
> We believe that this check in particular is the culprit:
> {code}
>     if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
>         hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
>         fsn.getReadHoldCount() != 1) {
>       // cannot relinquish
>       return false;
>     }
> {code}
> The content summary computation will only relinquish the lock if it is 
> currently the _only_ holder of the lock. Given the high volume of read 
> requests on a heavily loaded NameNode, especially when unfair locking is 
> enabled, it is likely there may be another holder of the read lock performing 
> some short-lived operation. By refusing to give up the lock in this case, the 
> content summary computation ends up never relinquishing the lock.
> We propose to simply remove the readHoldCount checks from this {{yield()}}. 
> This should alleviate the case described above by giving up the read lock and 
> allowing other short-lived operations to complete (while the content summary 
> thread sleeps) so that the lock can finally be given up completely. This has 
> the drawback that sometimes, the content summary may give up the lock 
> unnecessarily, if the read lock is never actually released by the time the 
> thread continues again. The only negative impact from this is to make some 
> large content summary operations slightly slower, with the tradeoff of 
> reducing NameNode-wide performance impact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-13904) ContentSummary does not always respect processing limit, resulting in long lock acquisitions

Reply via email to