Xiaoyu Yao created HDFS-13136:
---------------------------------
Summary: Avoid taking FSN lock while doing group member lookup for
FSD permission check
Key: HDFS-13136
URL: https://issues.apache.org/jira/browse/HDFS-13136
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
Namenode has FSN lock and FSD lock. Most of the namenode operations need to
take FSN lock first and then FSD lock. The permission check is done via
FSPermissionChecker at FSD layer assuming FSN lock is taken.
The FSPermissionChecker constructor invokes callerUgi.getGroups() that can take
seconds sometimes. There are external cache scheme such SSSD and internal cache
scheme for group lookup. However, the delay could still occur during cache
refresh, which causes severe FSN lock contentions and unresponsive namenode
issues.
Checking the current code, we found that getBlockLocations(..) did it right but
some methods such as getFileInfo(..), getContentSummary(..) did it wrong. This
ticket is open to ensure the group lookup for permission checker is outside the
FSN lock.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]