On Thu, 2009-10-15 at 08:15 -0700, Nathan Kinder wrote: > On 10/15/2009 06:40 AM, Simo Sorce wrote: > > On Thu, 2009-10-15 at 15:28 +0200, Pavel Zuna wrote: > > > >> Rob Crittenden wrote: > >> > >>> One of the last steps of an install is to run through any updates. This > >>> change adds a sleep() prior to calling tasks to ensure postop writes are > >>> done > >>> > >>> We were seeing a rare deadlock of DS when creating the memberOf task > >>> because one thread was adding memberOf in a postop while another was > >>> trying to create an index and this was causing a PRLock deadlock. > >>> > >>> rob > >>> > >>> > >> sleep might not be the best synchronization mechanism out there, but I > >> think > >> that in this case it is pretty much the only one available and it gets the > >> job > >> done, so ack. > >> > > So are we covering a DS bug here ? Or are we doing an asynchronous ldap > > request when we should do a synchronous one and wait for it to finish > > (I've fixed another place where we were doing that and racing against > > our own requests) ? > > > It has nothing to do with the way you are performing your LDAP operations. > > The issue really stems from what I consider to be a bug in NSPR's > implementation of reader-writer locks. It is documented that a single > thread can hold multiple reader locks safely, but I've found that to not > exactly be the case. The NSPR implementation favors writers, so a > thread waiting for the writer lock will block attempts by other threads > to get a reader lock. The problem is that we use reader locks in a > re-entrant fashion, so a thread that already has a reader lock can be > blocked when attempting to get a second reader lock due to a waiting > writer. This thread in turn blocks the writer thread since it already > holds a reader lock. > > I have proposed a solution to the NSPR developers that would allow an > attempt to get a reader lock to go through even if a writer is waiting > if the requesting thread already has another reader lock. I'm hoping > that this can be resolved in NSPR, otherwise we may have to change DS to > use the pthread_rwlock_* interfaces instead. > > The sleep is a temporary workaround. This issue should not arise in > normal operation since the lock in question is around the backend > struct, which is only modified when there is some sort of database > maintenance operation (such as the reindexing task that Rob triggered it > with).
Nathan, thanks for the explanation, very much appreciated. Simo. -- Simo Sorce * Red Hat, Inc * New York _______________________________________________ Freeipa-devel mailing list Freeipa-devel@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-devel