[
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950125#comment-13950125
]
Colin Patrick McCabe commented on HADOOP-10442:
-----------------------------------------------
Thanks for this patch, Kihwal.
To be honest, I find the behavior of {{getgrouplist}} that you are seeing to be
puzzling. The man page doesn't describe any negative return codes other than
-1.
{code}
RETURN VALUE
If the number of groups of which user is a member is less than or equal
to *ngroups, then the value *ngroups is returned.
If the user is a member of more than *ngroups groups, then
getgrouplist() returns -1. In this case the value returned in *ngroups can be
used to
resize the buffer passed to a further call getgrouplist().
{code}
What negative return code did you see besides -1? I guess what you're seeing
is undocumented, and possibly a bug in nslcd (or the man page?)
Also, looking at this more closely, I believe we mishandle the case where the
user is a member of no groups. This would be a pretty odd configuration (I
wonder if it's possible?). Just to be sure, I think we should consider
getgrouplist returning 0 to be ok.
> Group look-up can cause segmentation fault when certain JNI-based mapping
> module is used.
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-10442
> URL: https://issues.apache.org/jira/browse/HADOOP-10442
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.3.0, 2.4.0
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Priority: Blocker
> Fix For: 3.0.0, 2.4.0, 2.5.0
>
> Attachments: HADOOP-10442.patch
>
>
> When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used,
> we get segmentation fault very often. The same system ran 2.2 for months
> without any problem, but as soon as upgrading to 2.3, it started crashing.
> This resulted in multiple name node crashes per day.
> The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see
> this problem on the servers running sssd.
> There was one change in the C code and it modified the return code handling
> after getgrouplist() call. If the function returns 0 or a negative value less
> than -1, it will do realloc() instead of returning failure.
--
This message was sent by Atlassian JIRA
(v6.2#6252)