Why not use pthread_mutex_t instead of Mutex? Since the mutex is passed to pthread_cond_wait, wouldn't it be proper to replace Mutex and mutex_lock/unlock/create/destroy with those from pthreads?
-----Original Message----- From: Alexander Malysh [mailto:[EMAIL PROTECTED] Sent: 16 April 2007 22:05 To: [email protected] Subject: RE: Race condition in list_consume() Hi, yes, I know the story because I commited that patch :) The story is really simple, what we do: lock(mutex); # mutex here is gwlib mutex that sets owner to thread number list->single_operation_lock->owner = -1; # here we reset owner because pthread_cond_wait is not our wrapped mutex call and we don't want to panic in gwlib mutex pthread_cond_wait(&list->nonempty, &list->single_operation_lock->mutex); list->single_operation_lock->owner = gwthread_self(); # here we own mutex again, so just make gwlib mutex happy again Raul Igrisan wrote: > "and gets locked up because the list's atomic lock is locked by B." > > The lock is not held by B since the wait releases the lock (and reacquires > it before returning). > > > > Do you have a logical explanation for the unlock/wait/lock approach? > > > > > > > > > > I've seen this pattern in kannel sources: lock/set mutex owner to > -1/wait/set mutex owner to current thread/unlock > > list->single_operation_lock->owner = -1; > > > > Does anyone know its story? > > > > > > > > _____ > > From: Andreas Fink [mailto:[EMAIL PROTECTED] > Sent: 16 April 2007 16:44 > To: devel Devel > Subject: Race condition in list_consume() > > > > We found a severe bug in gwlib. > > > > We have the following scenario: > > > > A calls debug("xxx",0,"xxxx") which does : > > > > gwlist_add_producer(writers); > > > > and continues but doesnt reach yet this line: > > > > gwlist_remove_producer(writers); > > > > > > at this point the list "writers" is empty but has writers->num_producers=1 > > > > B does: > > lock(list); /* atomic lock */ > > > > list->single_operation_lock->owner = -1; > > pthread_cond_wait(&list->nonempty, &list->single_operation_lock->mutex); > > > > so it waits until A is calling gwlist_remove_producer() > > > > and wait until A completes. > > > > > > Now A is calling this: > > > > void gwlist_remove_producer(List *list) > > { > > lock(list); > > gw_assert(list->num_producers > 0); > > --list->num_producers; > > pthread_cond_broadcast(&list->nonempty); > > unlock(list); > > } > > > > and gets locked up because the list's atomic lock is locked by B. > > > > > > C now has a new debug message and gets stopped at gwlist_produce(). > > > > > > In other words, every process who wants to write to debug log gets stuck. > > > > Now there is different solutions to this. > > Our approach would be to do in gwlist_consume() to do this: > > > > > > unlock(list); > > pthread_cond_wait(&list->nonempty, &list->single_operation_lock->mutex); > > lock(list); > > > > > > Any other ideas? > > maybe no atomic lock around gwlist_remove_producer() ? > > > > > > Andreas Fink > > > > Fink Consulting GmbH > > Global Networks Schweiz AG > > BebbiCell AG > > > > --------------------------------------------------------------- > > Tel: +41-61-6666330 Fax: +41-61-6666331 Mobile: +41-79-2457333 > > Address: Clarastrasse 3, 4058 Basel, Switzerland > > E-Mail: [EMAIL PROTECTED] > > www.finkconsulting.com www.global-networks.ch www.bebbicell.ch > > --------------------------------------------------------------- > > ICQ: 8239353 MSN: [EMAIL PROTECTED] AIM: smsrelay Skype: andreasfink > > Yahoo: finkconsulting SMS: +41792457333 -- Thanks, Alex
