[ https://issues.apache.org/jira/browse/HADOOP-13263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348581#comment-15348581 ]
Wei-Chiu Chuang commented on HADOOP-13263: ------------------------------------------ Thanks [~sodonnell] this is a good idea, and thanks [~arpiagariu] for initial reviews. I have a few quick comments: [~sodonnell] What's the purpose of {{getBackgroundRefreshSuccess()}}, {{getBackgroundRefreshException}}, {{getBackgroundRefreshQueued}}, {{getBackgroundRefreshRunning}} in Group class? If they are used by tests only, they should not be {{public}} (most likely package-private), and they should be annotated with {{@VisibleForTesting}}. [~arpiagariu] bq. We should add the settings to hdfs-default.xml at a minimum. I don't think we have any site documentation for setting up group mapping. The new properties should go into core-default.xml. And there's a GroupsMapping.md under hadoop-common-project/hadoop-common/src/site/markdown. It would be really nice if we could get this groups mapping resolution feature described in this doc. I also wonder if the new properties should be defined in {{CommonConfigurationKeys}} instead, because {{CommonConfigurationKeysPublic}} has a javadoc that says: {code} /** * This class contains constants for configuration keys used * in the common code. * * It includes all publicly documented configuration keys. In general * this class should not be used directly (use CommonConfigurationKeys * instead) * */ {code} > Reload cached groups in background after expiry > ----------------------------------------------- > > Key: HADOOP-13263 > URL: https://issues.apache.org/jira/browse/HADOOP-13263 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Stephen O'Donnell > Assignee: Stephen O'Donnell > Attachments: HADOOP-13263.001.patch, HADOOP-13263.002.patch, > HADOOP-13263.003.patch, HADOOP-13263.004.patch, HADOOP-13263.005.patch, > HADOOP-13263.006.patch > > > In HADOOP-11238 the Guava cache was introduced to allow refreshes on the > Namenode group cache to run in the background, avoiding many slow group > lookups. Even with this change, I have seen quite a few clusters with issues > due to slow group lookups. The problem is most prevalent in HA clusters, > where a slow group lookup on the hdfs user can fail to return for over 45 > seconds causing the Failover Controller to kill it. > The way the current Guava cache implementation works is approximately: > 1) On initial load, the first thread to request groups for a given user > blocks until it returns. Any subsequent threads requesting that user block > until that first thread populates the cache. > 2) When the key expires, the first thread to hit the cache after expiry > blocks. While it is blocked, other threads will return the old value. > I feel it is this blocking thread that still gives the Namenode issues on > slow group lookups. If the call from the FC is the one that blocks and > lookups are slow, if can cause the NN to be killed. > Guava has the ability to refresh expired keys completely in the background, > where the first thread that hits an expired key schedules a background cache > reload, but still returns the old value. Then the cache is eventually > updated. This patch introduces this background reload feature. There are two > new parameters: > 1) hadoop.security.groups.cache.background.reload - default false to keep the > current behaviour. Set to true to enable a small thread pool and background > refresh for expired keys > 2) hadoop.security.groups.cache.background.reload.threads - only relevant if > the above is set to true. Controls how many threads are in the background > refresh pool. Default is 1, which is likely to be enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org