Murray S. Kucherawy --> dkim-milter-discuss (2008-07-08 16:03:28 -0700): > On Fri, 23 May 2008, Jukka Salmi wrote: > > Sure, thanks. With this patch applied, I see mi_wr_cmd() fail with > > EBADF. Log is [1]available. > > > > Regards, Jukka > > > > [1] http://salmi.ch/~jukka/dkim-milter/maillog_20080522 > > Yep, this confirms the previous findings. Everything is fine with the I/O > between postfix and the filter until: > > - postfix sends SMFIC_BODYEOM (end-of-message) to the filter and waits for > a reply > > - postfix immediately decides the wait for the reply has failed (though > the "why" remains a mystery), shuts down its connection to the filter and > temp-fails the message > > - dkim-filter still thinks the connection is there, so it tries to send an > SMFIR_INSHEADER (insert header) request, which fails because the socket is > actually no longer open > > - since the insert header request fails, it replies with SMFIR_TEMPFAIL to > try to get the message to temp-fail, but this also fails since the socket > is no longer open > > We know the second write returns with EBADF, meaning the descriptor has > been closed from the filter side. If it were the postfix side closing the > connection, we'd be seeing EPIPE instead of EBADF. > > It looks a lot like fd 8 in the dkim-filter process has suddenly become > invalid for no apparent reason. There's no path that I can see in > libmilter's source code to having that descriptor closed and yet > continuing to try to use it. dkim-filter doesn't have access to the > milter context structure in order to get access to that descriptor number, > so for it to be the problem it would have to call close() someplace on the > wrong descriptor number. However, neither libdkim nor dkim-filter ever > close() anything in normal operation because there's no need to do so. > libdkim only creates (and later closes, via libcrypto) temporary files for > certain special circumstances, and your configuration doesn't appear to be > using any of those. > > So, for the moment, I'm stumped. My best guesses now are a bug in the > underlying socket handling code (i.e. libc or the kernel) or something in > libcrypto which is causing BIO_free() to close the wrong descriptor from > time to time.
The systems in question will get an OS upgrade in the next weeks / months. Let's see if the milter problem disappears with that upgrade... > Someone said this doesn't happen if you change from UNIX domain sockets to > TCP sockets. Has this also been tried? Yes; I was originally using UNIX domain sockets when I first saw the problem but switched to TCP sockets then to be able to capture the packets. I haven't switched back since... Regards, Jukka -- bashian roulette: $ ((RANDOM%6)) || rm -rf ~ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ dkim-milter-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dkim-milter-discuss
