OK, I have a more in-depth summary of exactly what's going on here, and why the fcntl() calls fix it. The good news: we've stumbled on a pretty stable "fix" for this problem.
As background, the Amanda client operates something like this: amandad is invoked by (x)inetd or some other mechanism amandad uses the "Amanda protocol" to determine which service the server is requesting (sendbackup in this case) amandad forks and executes that service, after setting up a bunch of pipes for it. The unusual thing is that, aside from the usual stdin/stdout/stderr (fd's 0-2), amandad sets up six pipes at hard-wired file descriptors 50-55, and sendbackup uses those to send the data, index, and message streams back to the server. As background on POSIX: Multiple processes can hold the same file open at the same time. This is how, for example, a backgrounded process in the shell can "share" your terminal with the shell itself. Each of those processes would like to access the file either in blocking mode (waiting for data to be available) or nonblocking mode (immediately returning when no data is available, to allow the application to work on something else while waiting). Unfortunately, POSIX specifies that the file *itself* carries the O_NONBLOCK flag, so it is not specific to the application. In the case at hand, Amanda is accessing a particular pipe in nonblocking mode (for reasons explained below), while gzip expects it to be in blocking mode, and this leads to the EAGAIN that is killing gzip. As background on the OpenBSD pthreads (or, more accurately, uthreads -- lib/libpthread/uthread): This library shims its way between an application and the kernel, and implements blocking threaded operations on file descriptors using nonblocking kernel operations and a select() loop. In order to do so, it must set O_NONBLOCK on every file it accesses. This is easily accomplished by wrapping open(), pipe(), dup(), dup2(), socket(), and so on -- the syscalls which create new file descriptors. However, "inherited" file descriptors -- those opened by the parent before calling execve() -- are a little bit harder, because the library has no way to know about them. It hides this O_NONBLOCK flag from the application by masking it out of the fcntl() return value. The solution that uthreads uses is to start tracking a file the first time it is referenced in a syscall (in _thread_fd_lock, to be precise). It sets O_NONBLOCK when it starts tracking the file, and then removes the flag at the appropriate time (execve, in particular). In the failure mode, uthreads finds out about the index file *after* it has forked the gzip child, when sendbackup tries to close the file descriptor. Due to the design of the library, it carefully sets O_NONBLOCK before closing the file descriptor, leading gzip to get an EAGAIN error. The mysterious fcntl() calls, however, serve as a warning to uthreads that the index file exists. Uthreads sets the O_NONBLOCK flag when performing the fcntl(), but then clears it on execve(), so everything works as expected. So, there are really two fixes available, until OpenBSD's new threading library is available: 1. don't link Amanda client libraries with threading libraries 2. "inform" uthreads of the high-numbered FD's in all of the service binaries, using fcntl() Option 1 would be temporary -- eventually, I would like to be able to use threads on clients, to support compression and encryption, for example. Option 1 is also harder than it sounds -- futzing with the build process is like playing whack-a-mole, where any change causes problems on another platform. >From my analysis above, option 2 is fairly robust (as robust as OpenBSD's pthreads, anyway), and won't cause any trouble on systems with non-buggy threading libraries. So I'm leaning that direction. Dustin -- Open Source Storage Engineer http://www.zmanda.com