Hi,
yes, I know the story because I commited that patch :)
The story is really simple, what we do:
lock(mutex); # mutex here is gwlib mutex that sets owner to thread number
list->single_operation_lock->owner = -1; # here we reset owner because
pthread_cond_wait is not our wrapped mutex call and we don't want to panic
in gwlib mutex
pthread_cond_wait(&list->nonempty,
&list->single_operation_lock->mutex);
list->single_operation_lock->owner = gwthread_self(); # here we own mutex
again, so just make gwlib mutex happy again
Raul Igrisan wrote:
> "and gets locked up because the list's atomic lock is locked by B."
>
> The lock is not held by B since the wait releases the lock (and reacquires
> it before returning).
>
>
>
> Do you have a logical explanation for the unlock/wait/lock approach?
>
>
>
>
>
>
>
>
>
> I've seen this pattern in kannel sources: lock/set mutex owner to
> -1/wait/set mutex owner to current thread/unlock
>
> list->single_operation_lock->owner = -1;
>
>
>
> Does anyone know its story?
>
>
>
>
>
>
>
> _____
>
> From: Andreas Fink [mailto:[EMAIL PROTECTED]
> Sent: 16 April 2007 16:44
> To: devel Devel
> Subject: Race condition in list_consume()
>
>
>
> We found a severe bug in gwlib.
>
>
>
> We have the following scenario:
>
>
>
> A calls debug("xxx",0,"xxxx") which does :
>
>
>
> gwlist_add_producer(writers);
>
>
>
> and continues but doesnt reach yet this line:
>
>
>
> gwlist_remove_producer(writers);
>
>
>
>
>
> at this point the list "writers" is empty but has writers->num_producers=1
>
>
>
> B does:
>
> lock(list); /* atomic lock */
>
>
>
> list->single_operation_lock->owner = -1;
>
> pthread_cond_wait(&list->nonempty, &list->single_operation_lock->mutex);
>
>
>
> so it waits until A is calling gwlist_remove_producer()
>
>
>
> and wait until A completes.
>
>
>
>
>
> Now A is calling this:
>
>
>
> void gwlist_remove_producer(List *list)
>
> {
>
> lock(list);
>
> gw_assert(list->num_producers > 0);
>
> --list->num_producers;
>
> pthread_cond_broadcast(&list->nonempty);
>
> unlock(list);
>
> }
>
>
>
> and gets locked up because the list's atomic lock is locked by B.
>
>
>
>
>
> C now has a new debug message and gets stopped at gwlist_produce().
>
>
>
>
>
> In other words, every process who wants to write to debug log gets stuck.
>
>
>
> Now there is different solutions to this.
>
> Our approach would be to do in gwlist_consume() to do this:
>
>
>
>
>
> unlock(list);
>
> pthread_cond_wait(&list->nonempty, &list->single_operation_lock->mutex);
>
> lock(list);
>
>
>
>
>
> Any other ideas?
>
> maybe no atomic lock around gwlist_remove_producer() ?
>
>
>
>
>
> Andreas Fink
>
>
>
> Fink Consulting GmbH
>
> Global Networks Schweiz AG
>
> BebbiCell AG
>
>
>
> ---------------------------------------------------------------
>
> Tel: +41-61-6666330 Fax: +41-61-6666331 Mobile: +41-79-2457333
>
> Address: Clarastrasse 3, 4058 Basel, Switzerland
>
> E-Mail: [EMAIL PROTECTED]
>
> www.finkconsulting.com www.global-networks.ch www.bebbicell.ch
>
> ---------------------------------------------------------------
>
> ICQ: 8239353 MSN: [EMAIL PROTECTED] AIM: smsrelay Skype: andreasfink
>
> Yahoo: finkconsulting SMS: +41792457333
--
Thanks,
Alex