[
https://issues.apache.org/jira/browse/HDFS-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607707#comment-16607707
]
Erik Krogen edited comment on HDFS-13904 at 9/7/18 9:57 PM:
------------------------------------------------------------
Thanks [~arpitagarwal] and [~kihwal] for taking a look!
{quote}
Do you have snapshots? We don't.
{quote}
No snapshots!
{quote}
I wonder what is actually causing the long lock in your case.
...
We have long getContentSummary calls made periodically and they can take close
to a minute. But we do not hit this kind of long locking problems.
{quote}
The directory in question had around 100M entries. But, you make good points.
To describe more fully the situation under which we observed this behavior:
* There was a job periodically calling {{getContentSummary}} on this huge
directory. No long lock warnings were seen.
* The NameNode was restarted.
* Now, consistently, every {{contentSummary}} call on that directory caused the
lock hold warning.
* The NameNode was restarted again.
* The long lock holds were no longer seen.
So I agree that it is an odd case, and probably only triggered under certain
circumstances...
{quote}The change may help if the concurrent long reads are all
getContentSummary calls.{quote}
Are there other long read operations besides {{getContentSummary}}?
{{listStatus}} is already bounded by an even smaller limit.
I'll see if I can replicate this more deterministically to test the efficacy of
the patch, perhaps with Dynamometer...
was (Author: xkrogen):
Thanks [~arpitagarwal] and [~kihwal] for taking a look!
{quote}
Do you have snapshots? We don't.
{quote}
No snapshots!
{quote}
I wonder what is actually causing the long lock in your case.
...
We have long getContentSummary calls made periodically and they can take close
to a minute. But we do not hit this kind of long locking problems.
{quote}
The directory in question had around 100M entries. But, you make good points.
To describe more fully the situation under which we observed this behavior:
* There was a job periodically calling {{getContentSummary}} on this huge
directory. No long lock warnings were seen.
* The NameNode was restarted.
* Now, consistently, every {{contentSummary}} call on that directory caused the
lock hold warning.
* The NameNode was restarted again.
* The long lock holds were no longer seen.
So I agree that it is an odd case, and probably only triggered under certain
circumstances...
{quote}The change may help if the concurrent long reads are all
getContentSummary calls.{quote}
Are there other long read operations besides {{getContentSummary}}?
{{listStatus}} is already bounded by an even smaller limit.
> ContentSummary does not always respect processing limit, resulting in long
> lock acquisitions
> --------------------------------------------------------------------------------------------
>
> Key: HDFS-13904
> URL: https://issues.apache.org/jira/browse/HDFS-13904
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs, namenode
> Reporter: Erik Krogen
> Assignee: Erik Krogen
> Priority: Major
>
> HDFS-4995 added a config {{dfs.content-summary.limit}} which allows for an
> administrator to set a limit on the number of entries processed during a
> single acquisition of the {{FSNamesystemLock}} during the creation of a
> content summary. This is useful to prevent very long (multiple seconds)
> pauses on the NameNode when {{getContentSummary}} is called on large
> directories.
> However, even on versions with HDFS-4995, we have seen warnings like:
> {code}
> INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem read
> lock held for 9398 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:950)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:188)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1486)
> org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.yield(ContentSummaryComputationContext.java:109)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:679)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeContentSummary(INodeDirectory.java:642)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:656)
> {code}
> happen quite consistently when {{getContentSummary}} was called on a large
> directory on a heavily-loaded NameNode. Such long pauses completely destroy
> the performance of the NameNode. We have the limit set to its default of
> 5000; if it was respected, clearly there would not be a 10-second pause.
> The current {{yield()}} code within {{ContentSummaryComputationContext}}
> looks like:
> {code}
> public boolean yield() {
> // Are we set up to do this?
> if (limitPerRun <= 0 || dir == null || fsn == null) {
> return false;
> }
> // Have we reached the limit?
> long currentCount = counts.getFileCount() +
> counts.getSymlinkCount() +
> counts.getDirectoryCount() +
> counts.getSnapshotableDirectoryCount();
> if (currentCount <= nextCountLimit) {
> return false;
> }
> // Update the next limit
> nextCountLimit = currentCount + limitPerRun;
> boolean hadDirReadLock = dir.hasReadLock();
> boolean hadDirWriteLock = dir.hasWriteLock();
> boolean hadFsnReadLock = fsn.hasReadLock();
> boolean hadFsnWriteLock = fsn.hasWriteLock();
> // sanity check.
> if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
> hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
> fsn.getReadHoldCount() != 1) {
> // cannot relinquish
> return false;
> }
> // unlock
> dir.readUnlock();
> fsn.readUnlock("contentSummary");
> try {
> Thread.sleep(sleepMilliSec, sleepNanoSec);
> } catch (InterruptedException ie) {
> } finally {
> // reacquire
> fsn.readLock();
> dir.readLock();
> }
> yieldCount++;
> return true;
> }
> {code}
> We believe that this check in particular is the culprit:
> {code}
> if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
> hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
> fsn.getReadHoldCount() != 1) {
> // cannot relinquish
> return false;
> }
> {code}
> The content summary computation will only relinquish the lock if it is
> currently the _only_ holder of the lock. Given the high volume of read
> requests on a heavily loaded NameNode, especially when unfair locking is
> enabled, it is likely there may be another holder of the read lock performing
> some short-lived operation. By refusing to give up the lock in this case, the
> content summary computation ends up never relinquishing the lock.
> We propose to simply remove the readHoldCount checks from this {{yield()}}.
> This should alleviate the case described above by giving up the read lock and
> allowing other short-lived operations to complete (while the content summary
> thread sleeps) so that the lock can finally be given up completely. This has
> the drawback that sometimes, the content summary may give up the lock
> unnecessarily, if the read lock is never actually released by the time the
> thread continues again. The only negative impact from this is to make some
> large content summary operations slightly slower, with the tradeoff of
> reducing NameNode-wide performance impact.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]