[
https://issues.apache.org/jira/browse/AMBARI-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Onischuk updated AMBARI-14708:
-------------------------------------
Attachment: AMBARI-14708.patch
> LDAP Requests Via nslcd Take Too Long In Some Organizations
> -----------------------------------------------------------
>
> Key: AMBARI-14708
> URL: https://issues.apache.org/jira/browse/AMBARI-14708
> Project: Ambari
> Issue Type: Bug
> Reporter: Andrew Onischuk
> Assignee: Andrew Onischuk
> Fix For: 2.2.1
>
> Attachments: AMBARI-14708.patch
>
>
> When performing a restart of a large cluster where LDAP is being used
> indirectly by nslcd, the LDAP servers are put under heavy load. This is more
> evident in LDAP organizations that are large to begin with.
> connection from pid=12345 uid=0 gid=0
> nslcd_group_all()
> myldap_search(base="cn=groups,cn=accounts,dc=corp,dc=local",
> filter="(objectClass=posixGroup)")
> ldap_result(): end of results
>
>
>
>
> It turns out that these processes are the before-ANY hook script which
> runs when a service is started, like this one I was running locally to
> reproduce the query patterns.
>
>
> /usr/bin/python2.6 /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-
> ANY/scripts/hook.py ANY /var/lib/ambari-agent/data/command-5950.json /var/lib
> /ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY /var/lib/ambari-
> agent/data/structured-out-5950.json INFO /var/lib/ambari-agent/data/tmp
>
>
>
>
> I tracked the issue down to this function in
> {{resource_management/core/providers/accounts.py}}:
>
>
> @property
> def user_groups(self):
> return [g.gr_name for g in grp.getgrall() if self.resource.username in g.gr_me
>
>
>
>
> This property actually gets referenced at least 2 times for each user.
> The call to {{grp.getgrall()}} forces a complete enumeration of groups every
> time.
>
> What this means is for a cluster with many nodes with many processes
> restarting across those nodes you are going to have many of these full
> enumeration searches running at the same time. In an enterprise with a large
> directory this will get very expensive, especially since this type of call is
> not cached by nscd.
>
> I'm aware that the idiom used here to get the groups is common in python
> but it's actually pretty inefficient. Commands like id and groups have more
> efficient ways of discovering this. I'm not aware of the equivalent of these
> in Python.
>
>
> @property
> def user_groups(self):
> ret = []
> (rc, output) = shell.checked_call(['groups', self.resource.username](https://h
> sudo=True)
> if rc == 0:
> ret.extend(output.split(':')[1](
> ).lstrip().split())
> return ret
> This converts the full LDAP scan for groups to more efficient queries targeted
> to the user. The lookups done by the groups command are also 100% cacheable.
> Since it's a checked call the `rc == 0` check is probably not needed.
> An unfortunate effect of how usermod and friends work is that it always
> invalidates the nscd cache after it's run. This means that Ambari could still
> be a lot more efficient than it is when LDAP is in play by being pickier about
> when it runs commands like useradd/usermod/groupadd/groupmod.
> We can also probably put a timed cache on the results from `grp.getgrall()` or
> `groups` in memory, configurable by the agent config file. This way, we would
> only call it once every hour or so.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)