Hi list,
I've spotted an issue in gwlib whe using in our own application which
puzzles me.
The start is this error message:
2007-03-31 14:48:14 [510] [46] INFO: SMPP: Accepted connection from:
xx.xx.xx.xx
2007-03-31 14:48:14 [510] [-1] PANIC: /Users/afink/development/gwlib/
src/gwlib/thread.c:142: mutex_lock_real: Managed to lock the mutex
twice! (Called from /Users/afink/development/gwlib/src/gwlib/list.c:
334:gwlist_lock.)
so a gwlist_lock is screwed up. Sounds simple to fix but its not.
mutex_lock does this:
ret = pthread_mutex_lock(&mutex->mutex);
if (ret != 0)
panic(0, "%s:%ld: %s: Mutex failure! (Called from %s:%ld:%
s.)", \
__FILE__, (long) __LINE__, __func__, file, (long) line,
func);
if (mutex->owner == gwthread_self())
panic(0, "%s:%ld: %s: Managed to lock the mutex twice!
(Called from %s:%ld:%s.)", \
__FILE__, (long) __LINE__, __func__, file, (long) line,
func);
if the mutex is already locked, the first panic is called. That
message has not appeared in the log. So we can assume the mutex had
been successfully locked and this is not the issue.
Then it checks if the owner is ourself (to disallow double locking if
you are the owner).
If the owner is us, then we panic. This is what's happening here.
The panic shows a thread id of -1. This means we are in thread -1
which is invalid. If the mutex is not owned by anyone,
mutex->owner is set to -1. Thats the clash we see. So the error comes
from the fact that gwthread_self returns -1:
/* Return the thread id of this thread. */
long gwthread_self(void)
{
struct threadinfo *threadinfo;
threadinfo = pthread_getspecific(tsd_key);
if (threadinfo)
return threadinfo->number;
else
return -1;
}
It does return -1 when pthread_getspecific(tsd_key) returns NULL.
This occurs when the specific thread has not received its value yet.
of the thread has not completed yet. This is done in new_thread()
only. So it must panic before the thread has completed its startup.
I suspect a race condition in static void *new_thread(void *arg) in
gwthread_pthread.c. note this is a multi CPU situation.
This is what is in my code triggering this:
info(0, "SMPP: Accepted connection from: %s", octstr_get_cstr(pc-
>remote_host));
gwthread_create(SMPP_Handler_Thread,(void*)pc);
gwthread_create calls spawn_thread to do the job.
spawn_thread should say "Started thread %ld (%s)" or "Failed to start
thread (%s)" at its end. But we don't see this!
This error happens before those outputs so we must have a race
condition here in spawn_thread / new_thread.
The error must be created out from new_thread as otherwise the
gwthread_self call would return its id.
spawn_thread allocates a memory structure to pass the parameters to
the new thread. Then it calls pthread_create.
Lets look at new thread:
static void *new_thread(void *arg)
{
int ret;
struct new_thread_args *p = arg;
/* Make sure we don't start until our parent has entered
* our thread info in the thread table. */
lock();
well here (before we actually set pthread_getspecific(tsd_key)) we
lock the global thread list to synch with the main thread. (note the
inline function lock is calling pthread_mutex_lock, not mutex_lock).
So new thread would wait until spawn_thread has filled all fields and
then unlocks() and calls pthread_setspecific() right after that.
So basically there is no way of getting into this situation...
Does anyone can see how this scenario could occur?
Its rare but severe. Happens on a dual CPU machine under MacOS X.
Andreas Fink
Fink Consulting GmbH
Global Networks Schweiz AG
BebbiCell AG
---------------------------------------------------------------
Tel: +41-61-6666330 Fax: +41-61-6666331 Mobile: +41-79-2457333
Address: Clarastrasse 3, 4058 Basel, Switzerland
E-Mail: [EMAIL PROTECTED]
www.finkconsulting.com www.global-networks.ch www.bebbicell.ch
---------------------------------------------------------------
ICQ: 8239353 MSN: [EMAIL PROTECTED] AIM: smsrelay Skype: andreasfink
Yahoo: finkconsulting SMS: +41792457333