[ 
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950125#comment-13950125
 ] 

Colin Patrick McCabe commented on HADOOP-10442:
-----------------------------------------------

Thanks for this patch, Kihwal.

To be honest, I find the behavior of {{getgrouplist}} that you are seeing to be 
puzzling.  The man page doesn't describe any negative return codes other than 
-1.

{code}
RETURN VALUE
       If the number of groups of which user is a member is less than or equal 
to *ngroups, then the value *ngroups is returned.

       If the user is a member of more than *ngroups groups, then 
getgrouplist() returns -1.  In this case the value returned in *ngroups can  be 
 used  to
       resize the buffer passed to a further call getgrouplist().
{code}

What negative return code did you see besides -1?  I guess what you're seeing 
is undocumented, and possibly a bug in nslcd (or the man page?)

Also, looking at this more closely, I believe we mishandle the case where the 
user is a member of no groups.  This would be a pretty odd configuration (I 
wonder if it's possible?).  Just to be sure, I think we should consider 
getgrouplist returning 0 to be ok.

> Group look-up can cause segmentation fault when certain JNI-based mapping 
> module is used.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-10442
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10442
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.3.0, 2.4.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Blocker
>             Fix For: 3.0.0, 2.4.0, 2.5.0
>
>         Attachments: HADOOP-10442.patch
>
>
> When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used, 
> we get segmentation fault very often. The same system ran 2.2 for months 
> without any problem, but as soon as upgrading to 2.3, it started crashing.  
> This resulted in multiple name node crashes per day.
> The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see 
> this problem on the servers running sssd. 
> There was one change in the C code and it modified the return code handling 
> after getgrouplist() call. If the function returns 0 or a negative value less 
> than -1, it will do realloc() instead of returning failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to