On Wed, Jan 21, 2004 at 11:12:20AM -0600, Ben Collins-Sussman wrote: > On Wed, 2004-01-21 at 04:29, Joe Orton wrote: > > I have now managed to reproduce hangs a couple of times here, > > What exactly was your reproduction recipe? Same as mine? Start an > import over SSL and then 'graceful' the server?
By doing graceful restarts every few seconds during a large import, I could reproduce a hang using ra_dav both over SSL or over plain HTTP to a 0.36.0 server running on the localhost, when using DB 4.1.25. I've upgraded to 4.2.52 and I can no longer reproduce the problem. Were you using DB 4.2.52 too, and could you not reproduce the issue when using plain HTTP? > > only using > > db-4.1.25 on the server: the child was sat eating CPU, either doing no > > syscalls or doing: > > > > futex(0xbf58b1b0, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily > > unavailable) > > futex(0xbf58b1b0, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily > > unavailable) > > futex(0xbf58b1b0, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily > > unavailable) > > futex(0xbf58b1b0, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily > > unavailable) > > futex(0xbf58b1b0, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily > > unavailable) > > > > ...which could be a db4 issue. Is that consistent with what you're > > seeing with 4.2.52? > > I'm pretty ignorant about strace-y things. Why do you think this output > is relevant to BDB? POSIX mutexes are the only thing in use which will use futexes for locking, and only DB4 is configured to use POSIX mutexes, ergo either DB4 (or libpthread itself) is looping like that. > I wonder if the next step is to simply try reproducing the bug by > running client and server both on my local box? (It's a RH9 box with the > latest 2.4.20 kernel.) Yes, sounds good. joe
