We recently set about upgrading to Cyrus SASL 2.1.21 and OpenLDAP 2.3.24. We use NSS LDAP 207 to provide user and group information on our LDAP servers. In our test environment OpenLDAP was segfaulting as soon as any client attempted to speak SASL with it.
And you know what? Our 3 year old version 207 of NSS LDAP may be to blame. From the NSS LDAP changelog version 210 fixed something (can't find the 208 and 210 tarballs to confirm this):
210 Luke Howard
* initialize DBT structures
* fix SASL crasher
But you could argue that Cyrus SASL was also to blame.
The crash was caused by the NSS setting up SASL and then OpenLDAP changing the SASL mutex functions and subsequently using SASL. The first use of SASL initialised gss_mutex to a dummy value of (void *)1, and the later use attempted to lock that mutex using real mutex code. It could be argued that Cyrus SASL should take care of gss_mutex when sasl_set_mutex sets up proper mutex handlers, which is what my patch does (to some degree). I won't try to defend my patch as a proper fix.
The following sequence of events led to the segfault:
OpenLDAP calls initgroups(), which calls NSS which calls Cyrus gssapiv2_client_plug_init().
Cyrus gssapiv2_client_plug_init() initialises gss_mutex to (void *)1 using the default mutex handlers.
[OpenLDAP should not know that initgroups() resulted in SASL interaction]
Later, OpenLDAP calls Cyrus sasl_set_mutex() to set up custom mutex handlers within Cyrus SASL.
OpenLDAP calls Cyrus gssapiv2_server_plug_init() which does not initialise/allocate gss_mutex (since it is not NULL).
The client hits the LDAP server and requests SASL authentication.
OpenLDAP calls Cyrus sasl_server_start() which attempts to use the specified mutex handler to lock gss_mutex.
Since gss_mutex was set to (void *)1 by the default handler, pthread_mutex_lock segfaults.
There is a stack trace below[1] that shows the frame that generated the segfault.
My attached patch works around the issue by calling the mutex allocation routine if gss_mutex is NULL (as before) or if it is (void *)1. In this way alloc is always recalled if the builtin do-nearly-nothing alloc was used previously, in case sasl_set_mutex() has been used to replace the mutex functions.
Thanks,
Sean Burford
[1]: Stack trace when client attempts SASL negotiation:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 32771 (LWP 16696)]
0x402ce092 in pthread_mutex_lock () from /lib/i686/libpthread.so.0
* 4 Thread 32771 (LWP 16696) 0x402ce092 in pthread_mutex_lock ()
from /lib/i686/libpthread.so.0
3 Thread 16386 (LWP 16695) 0x403e6251 in select () from /lib/i686/libc.so.6
2 Thread 32769 (LWP 16694) 0x403e3f7a in poll () from /lib/i686/libc.so.6
1 Thread 16384 (LWP 16691) 0x402d00d4 in __pthread_sigsuspend ()
from /lib/i686/libpthread.so.0
2 Thread 32769 (LWP 16694) 0x403e3f7a in poll () from /lib/i686/libc.so.6
1 Thread 16384 (LWP 16691) 0x402d00d4 in __pthread_sigsuspend ()
from /lib/i686/libpthread.so.0
Thread 4 (Thread 32771 (LWP 16696)):
#0 0x402ce092 in pthread_mutex_lock () from /lib/i686/libpthread.so.0
#1 0x081a01f5 in ldap_pvt_thread_mutex_lock (mutex=0x1)
at ../../../libraries/libldap_r/thr_posix.c:333
#2 0x081a90ee in ldap_pvt_sasl_mutex_lock (mutex=0x1) at cyrus.c:1288
#3 0x407c95e9 in gssapi_server_mech_step (conn_context=0x83912e8,
params=0x838fb20,
clientin=0x8391084 "...",
at ../../../libraries/libldap_r/thr_posix.c:333
#2 0x081a90ee in ldap_pvt_sasl_mutex_lock (mutex=0x1) at cyrus.c:1288
#3 0x407c95e9 in gssapi_server_mech_step (conn_context=0x83912e8,
params=0x838fb20,
clientin=0x8391084 "...",
clientinlen=601, serverout=0x65e66ba4, serveroutlen=0x65e66b9c,
oparams=0x838fb20) at ../../plugins/gssapi.c:671
#4 0x401580b5 in sasl_server_step (conn=0x838ff00,
clientin=0x8391084 "...",
#4 0x401580b5 in sasl_server_step (conn=0x838ff00,
clientin=0x8391084 "...",
clientinlen=601, serverout=0x65e66ba4, serveroutlen=0x0)
at ../../lib/server.c:1411
at ../../lib/server.c:1411
#5 0x40157d27 in sasl_server_start (conn=0x838ff00, mech=0x0,
clientin=0x8391084 "...",
clientin=0x8391084 "...",
clientinlen=601, serverout=0x65e66ba4, serveroutlen=0x65e66b9c)
at ../../lib/server.c:1331
at ../../lib/server.c:1331
#6 0x080d4f13 in slap_sasl_bind (op=0x8390e88, rs=0x65e66cb0)
at ../../../servers/slapd/sasl.c:1393
#7 0x0809eaf2 in fe_op_bind (op=0x8390e88, rs=0x65e66cb0)
at ../../../servers/slapd/bind.c:275
#8 0x0809e6f0 in do_bind (op=0x8390e88, rs=0x65e66cb0)
at ../../../servers/slapd/bind.c:200
#9 0x0807ada2 in connection_operation (ctx=0x65e66d50, arg_v=0x8390e88)
at ../../../servers/slapd/connection.c:1307
#10 0x0819f15c in ldap_int_thread_pool_wrapper (xpool=0x82c9770)
at ../../../libraries/libldap_r/tpool.c:478
at ../../../servers/slapd/sasl.c:1393
#7 0x0809eaf2 in fe_op_bind (op=0x8390e88, rs=0x65e66cb0)
at ../../../servers/slapd/bind.c:275
#8 0x0809e6f0 in do_bind (op=0x8390e88, rs=0x65e66cb0)
at ../../../servers/slapd/bind.c:200
#9 0x0807ada2 in connection_operation (ctx=0x65e66d50, arg_v=0x8390e88)
at ../../../servers/slapd/connection.c:1307
#10 0x0819f15c in ldap_int_thread_pool_wrapper (xpool=0x82c9770)
at ../../../libraries/libldap_r/tpool.c:478
diff -c -r cyrus-sasl-2.1.22/include/saslplug.h cyrus-sasl-2.1.22.mod/include/saslplug.h *** cyrus-sasl-2.1.22/include/saslplug.h Tue Mar 14 06:23:21 2006 --- cyrus-sasl-2.1.22.mod/include/saslplug.h Mon Aug 21 16:18:29 2006 *************** *** 21,26 **** --- 21,31 ---- extern "C" { #endif + /* Value to indicate that the dummy mutex handling functions have allocated + * gss_mutex, and that further calls to _alloc are required just incase + * sasl_set_mutex has been called since then. */ + #define SASL_INTERNAL_MUTEX_DUMMY (void *)0x01 + /* callback to lookup a sasl_callback_t for a connection * input: * conn -- the connection to lookup a callback for diff -c -r cyrus-sasl-2.1.22/lib/common.c cyrus-sasl-2.1.22.mod/lib/common.c *** cyrus-sasl-2.1.22/lib/common.c Wed Apr 19 11:39:59 2006 --- cyrus-sasl-2.1.22.mod/lib/common.c Mon Aug 21 16:10:50 2006 *************** *** 124,130 **** /* Intenal mutex functions do as little as possible (no thread protection) */ static void *sasl_mutex_alloc(void) { ! return (void *)0x1; } static int sasl_mutex_lock(void *mutex __attribute__((unused))) --- 124,130 ---- /* Intenal mutex functions do as little as possible (no thread protection) */ static void *sasl_mutex_alloc(void) { ! return SASL_INTERNAL_MUTEX_DUMMY; } static int sasl_mutex_lock(void *mutex __attribute__((unused))) diff -c -r cyrus-sasl-2.1.22/plugins/gssapi.c cyrus-sasl-2.1.22.mod/plugins/gssapi.c *** cyrus-sasl-2.1.22/plugins/gssapi.c Wed Jul 21 07:39:06 2004 --- cyrus-sasl-2.1.22.mod/plugins/gssapi.c Mon Aug 21 16:20:28 2006 *************** *** 589,595 **** const sasl_utils_t *utils) { #ifdef GSS_USE_MUTEXES ! if (gss_mutex) { utils->mutex_free(gss_mutex); gss_mutex=NULL; } --- 589,595 ---- const sasl_utils_t *utils) { #ifdef GSS_USE_MUTEXES ! if (gss_mutex && gss_mutex != SASL_INTERNAL_MUTEX_DUMMY) { utils->mutex_free(gss_mutex); gss_mutex=NULL; } *************** *** 1246,1252 **** *plugcount = 1; #ifdef GSS_USE_MUTEXES ! if (!gss_mutex) { gss_mutex = utils->mutex_alloc(); if (!gss_mutex) { return SASL_FAIL; --- 1246,1252 ---- *plugcount = 1; #ifdef GSS_USE_MUTEXES ! if (!gss_mutex || gss_mutex == SASL_INTERNAL_MUTEX_DUMMY) { gss_mutex = utils->mutex_alloc(); if (!gss_mutex) { return SASL_FAIL; *************** *** 1785,1791 **** *plugcount = 1; #ifdef GSS_USE_MUTEXES ! if(!gss_mutex) { gss_mutex = utils->mutex_alloc(); if(!gss_mutex) { return SASL_FAIL; --- 1785,1791 ---- *plugcount = 1; #ifdef GSS_USE_MUTEXES ! if(!gss_mutex || gss_mutex == SASL_INTERNAL_MUTEX_DUMMY) { gss_mutex = utils->mutex_alloc(); if(!gss_mutex) { return SASL_FAIL;