[
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361633#comment-16361633
]
Xiaoyu Yao commented on HDFS-13136:
-----------------------------------
Attach an initial patch to move the getPermissionChecker() out of FSN lock.
Thanks for the offline discussion with [~szetszwo].
This patch also removes the repeated group lookup from recursive calls such as
FSDirStatAndListingOp#getContentSummaryInt(), which will help to improve NN
performance.
> Avoid taking FSN lock while doing group member lookup for FSD permission check
> ------------------------------------------------------------------------------
>
> Key: HDFS-13136
> URL: https://issues.apache.org/jira/browse/HDFS-13136
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Reporter: Xiaoyu Yao
> Assignee: Xiaoyu Yao
> Priority: Major
> Attachments: HDFS-13136.001.patch
>
>
> Namenode has FSN lock and FSD lock. Most of the namenode operations need to
> take FSN lock first and then FSD lock. The permission check is done via
> FSPermissionChecker at FSD layer assuming FSN lock is taken.
> The FSPermissionChecker constructor invokes callerUgi.getGroups() that can
> take seconds sometimes. There are external cache scheme such SSSD and
> internal cache scheme for group lookup. However, the delay could still occur
> during cache refresh, which causes severe FSN lock contentions and
> unresponsive namenode issues.
> Checking the current code, we found that getBlockLocations(..) did it right
> but some methods such as getFileInfo(..), getContentSummary(..) did it wrong.
> This ticket is open to ensure the group lookup for permission checker is
> outside the FSN lock.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]