Hi Andreas,
I don't see any problem in gwlist/log. pthread_cond_wait releases the mutex
passed to it. Seems you have race on some other place.
Andreas Fink wrote:
> We found a severe bug in gwlib.
>
> We have the following scenario:
>
> A calls debug("xxx",0,"xxxx") which does :
>
> gwlist_add_producer(writers);
>
> and continues but doesnt reach yet this line:
>
> gwlist_remove_producer(writers);
>
>
> at this point the list "writers" is empty but has writers-
> >num_producers=1
>
> B does:
> lock(list); /* atomic lock */
>
> list->single_operation_lock->owner = -1;
> pthread_cond_wait(&list->nonempty, &list-
> >single_operation_lock->mutex);
>
> so it waits until A is calling gwlist_remove_producer()
>
> and wait until A completes.
>
>
> Now A is calling this:
>
> void gwlist_remove_producer(List *list)
> {
> lock(list);
> gw_assert(list->num_producers > 0);
> --list->num_producers;
> pthread_cond_broadcast(&list->nonempty);
> unlock(list);
> }
>
> and gets locked up because the list's atomic lock is locked by B.
>
>
> C now has a new debug message and gets stopped at gwlist_produce().
>
>
> In other words, every process who wants to write to debug log gets
> stuck.
>
> Now there is different solutions to this.
> Our approach would be to do in gwlist_consume() to do this:
>
>
> unlock(list);
> pthread_cond_wait(&list->nonempty, &list-
> >single_operation_lock->mutex);
> lock(list);
>
>
> Any other ideas?
> maybe no atomic lock around gwlist_remove_producer() ?
>
>
> Andreas Fink
>
> Fink Consulting GmbH
> Global Networks Schweiz AG
> BebbiCell AG
>
> ---------------------------------------------------------------
> Tel: +41-61-6666330 Fax: +41-61-6666331 Mobile: +41-79-2457333
> Address: Clarastrasse 3, 4058 Basel, Switzerland
> E-Mail: [EMAIL PROTECTED]
> www.finkconsulting.com www.global-networks.ch www.bebbicell.ch
> ---------------------------------------------------------------
> ICQ: 8239353 MSN: [EMAIL PROTECTED] AIM: smsrelay Skype: andreasfink
> Yahoo: finkconsulting SMS: +41792457333
--
Thanks,
Alex