Xiaoyu Yao commented on HDFS-13136:

Thanks [~szetszwo] for the review. I will commit the patch to trunk and 
branch-3.1 (clean cherry-pick) shortly. There are some conflicts on branch-3.0 
which I just submitted a resolved patch for Jenkins check.

> Avoid taking FSN lock while doing group member lookup for FSD permission check
> ------------------------------------------------------------------------------
>                 Key: HDFS-13136
>                 URL: https://issues.apache.org/jira/browse/HDFS-13136
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>            Priority: Major
>         Attachments: HDFS-13136-branch-3.0.001.patch, HDFS-13136.001.patch, 
> HDFS-13136.002.patch
> Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
> take FSN lock first and then FSD lock.  The permission check is done via 
> FSPermissionChecker at FSD layer assuming FSN lock is taken. 
> The FSPermissionChecker constructor invokes callerUgi.getGroups() that can 
> take seconds sometimes. There are external cache scheme such SSSD and 
> internal cache scheme for group lookup. However, the delay could still occur 
> during cache refresh, which causes severe FSN lock contentions and 
> unresponsive namenode issues.
> Checking the current code, we found that getBlockLocations(..) did it right 
> but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. 
> This ticket is open to ensure the group lookup for permission checker is 
> outside the FSN lock.  

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to