On Thu, 2009-12-03 at 09:53 +0100, Petter Reinholdtsen wrote:
> I backported nss-pam-ldapd version 0.7.1 to Lenny, and installed the
> nslcd and libnss-ldapd (pam would not work without newer pam-runtime)
> packages to see if I could get rid of a segfault in nscd.

Thanks for reporting this. Just to be clear, you are seeing this with
libnss-ldapd 0.6.7.1 and 0.7.1?

> To trigger the problem, I try to log in using ssh. I suspect the cause
> is the huge amount of groups and the huge amount of group members we
> have here at the university.

Hmm, I've tried (even installed SSH in a lenny chroot jail in my test
environment) but haven't been able to reproduce this (one group with
1000 members).

> This is the valgrind output when it crashes:
> 
> ==9529==
> ==9529== Process terminating with default action of signal 11 (SIGSEGV)
> ==9529==  Bad permissions for mapped region at address 0xAEA0000
> ==9529==    at 0xB5CB663: read_group (group.c:43)
> ==9529==    by 0xB5CB9E3: _nss_ldap_getgrent_r (group.c:155)
> ==9529==    by 0xB5A54D0: getgrent_next_nss (compat-initgroups.c:324)
> ==9529==    by 0xB5A596E: _nss_compat_initgroups_dyn (compat-initgroups.c:430)
> ==9529==    by 0x1232C: addinitgroupsX (in /usr/sbin/nscd)
> ==9529==    by 0x12E0E: readdinitgroups (in /usr/sbin/nscd)
> ==9529==    by 0xF302: prune_cache (in /usr/sbin/nscd)
> ==9529==    by 0x756A: nscd_run (in /usr/sbin/nscd)
> ==9529==    by 0x4836F3A: start_thread (pthread_create.c:297)
> ==9529==    by 0x492ABED: clone (in /usr/lib/debug/libc-2.7.so)
> ==9529==
> ==9529== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 25 from 1)
> ==9529== malloc/free: in use at exit: 276,148 bytes in 16 blocks.
> ==9529== malloc/free: 17,708 allocs, 17,692 frees, 1,159,719,814 bytes 
> allocated.
> ==9529== For counts of detected errors, rerun with: -v
> ==9529== searching for pointers to 16 not-freed blocks.
> ==9529== checked 8,918,640 bytes.
> ==9529==
> ==9529== LEAK SUMMARY:
> ==9529==    definitely lost: 136 bytes in 1 blocks.
> ==9529==      possibly lost: 952 bytes in 7 blocks.
> ==9529==    still reachable: 275,060 bytes in 8 blocks.
> ==9529==         suppressed: 0 bytes in 0 blocks.
> ==9529== Rerun with --leak-check=full to see details of leaked memory.
> Killed

The segfault happens when reading a returned group entry (while listing
all groups) and specifically when reading the group members.

Can you also reproduce this with just 'getent group' (or id -a user)?
Does it make a difference if nscd is running or not? Does cleaning the
nscd cache make a difference (nscd -i passwd; nscd -i group)?

If this is a problem with the communication between nscd and the NSS
module, recompiling the NSS module with -DDEBUG_PROT (and maybe even
-DDEBUG_PROT_DUMP) could give a lot more details. Warning: this causes
every command that does NSS lookups (through LDAP) to output a lot of
debugging information.

> Changing the group entry to 'files ldap' avoid the crash, but will not
> work for me as we use +...@netgroup entries in /etc/group and
> /etc/passwd.

Does changing +...@netgroup to just + make a difference (haven't set this
up in my test environment)?

Anyway, thanks for pointing this out.

-- 
-- arthur - [email protected] - http://people.debian.org/~adejong --

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to