[ 
https://issues.apache.org/jira/browse/AMBARI-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Onischuk updated AMBARI-14708:
-------------------------------------
    Attachment: AMBARI-14708.patch

> LDAP Requests Via nslcd Take Too Long In Some Organizations
> -----------------------------------------------------------
>
>                 Key: AMBARI-14708
>                 URL: https://issues.apache.org/jira/browse/AMBARI-14708
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: Andrew Onischuk
>            Assignee: Andrew Onischuk
>             Fix For: 2.2.1
>
>         Attachments: AMBARI-14708.patch
>
>
> When performing a restart of a large cluster where LDAP is being used
> indirectly by nslcd, the LDAP servers are put under heavy load. This is more
> evident in LDAP organizations that are large to begin with.
> connection from pid=12345 uid=0 gid=0  
> nslcd_group_all()  
> myldap_search(base="cn=groups,cn=accounts,dc=corp,dc=local",
> filter="(objectClass=posixGroup)")  
> ldap_result(): end of results
>     
>     
>     
>     
>     It turns out that these processes are the before-ANY hook script which 
> runs when a service is started, like this one I was running locally to 
> reproduce the query patterns.
>     
>     
> /usr/bin/python2.6 /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-
> ANY/scripts/hook.py ANY /var/lib/ambari-agent/data/command-5950.json /var/lib
> /ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY /var/lib/ambari-
> agent/data/structured-out-5950.json INFO /var/lib/ambari-agent/data/tmp
>     
>     
>     
>     
>     I tracked the issue down to this function in 
> {{resource_management/core/providers/accounts.py}}:
>     
>     
> @property  
> def user_groups(self):  
> return [g.gr_name for g in grp.getgrall() if self.resource.username in g.gr_me
>     
>     
>     
>     
>     This property actually gets referenced at least 2 times for each user.  
> The call to {{grp.getgrall()}} forces a complete enumeration of groups every 
> time.
>     
>     What this means is for a cluster with many nodes with many processes 
> restarting across those nodes you are going to have many of these full 
> enumeration searches running at the same time.  In an enterprise with a large 
> directory this will get very expensive, especially since this type of call is 
> not cached by nscd.
>     
>     I'm aware that the idiom used here to get the groups is common in python 
> but it's actually pretty inefficient.  Commands like id and groups have more 
> efficient ways of discovering this.  I'm not aware of the equivalent of these 
> in Python.
>     
>     
> @property  
> def user_groups(self):  
> ret = []  
> (rc, output) = shell.checked_call(['groups', self.resource.username](https://h
> sudo=True)  
> if rc == 0:  
> ret.extend(output.split(':')[1](
> ).lstrip().split())  
> return ret
> This converts the full LDAP scan for groups to more efficient queries targeted
> to the user. The lookups done by the groups command are also 100% cacheable.
> Since it's a checked call the `rc == 0` check is probably not needed.
> An unfortunate effect of how usermod and friends work is that it always
> invalidates the nscd cache after it's run. This means that Ambari could still
> be a lot more efficient than it is when LDAP is in play by being pickier about
> when it runs commands like useradd/usermod/groupadd/groupmod.
> We can also probably put a timed cache on the results from `grp.getgrall()` or
> `groups` in memory, configurable by the agent config file. This way, we would
> only call it once every hour or so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to