On Fri, 23 May 2008, Jukka Salmi wrote:
> Sure, thanks. With this patch applied, I see mi_wr_cmd() fail with 
> EBADF. Log is [1]available.
>
> Regards, Jukka
>
> [1] http://salmi.ch/~jukka/dkim-milter/maillog_20080522

Yep, this confirms the previous findings.  Everything is fine with the I/O 
between postfix and the filter until:

- postfix sends SMFIC_BODYEOM (end-of-message) to the filter and waits for 
a reply

- postfix immediately decides the wait for the reply has failed (though 
the "why" remains a mystery), shuts down its connection to the filter and 
temp-fails the message

- dkim-filter still thinks the connection is there, so it tries to send an 
SMFIR_INSHEADER (insert header) request, which fails because the socket is 
actually no longer open

- since the insert header request fails, it replies with SMFIR_TEMPFAIL to 
try to get the message to temp-fail, but this also fails since the socket 
is no longer open

We know the second write returns with EBADF, meaning the descriptor has 
been closed from the filter side.  If it were the postfix side closing the 
connection, we'd be seeing EPIPE instead of EBADF.

It looks a lot like fd 8 in the dkim-filter process has suddenly become 
invalid for no apparent reason.  There's no path that I can see in 
libmilter's source code to having that descriptor closed and yet 
continuing to try to use it.  dkim-filter doesn't have access to the 
milter context structure in order to get access to that descriptor number, 
so for it to be the problem it would have to call close() someplace on the 
wrong descriptor number.  However, neither libdkim nor dkim-filter ever 
close() anything in normal operation because there's no need to do so. 
libdkim only creates (and later closes, via libcrypto) temporary files for 
certain special circumstances, and your configuration doesn't appear to be 
using any of those.

So, for the moment, I'm stumped.  My best guesses now are a bug in the 
underlying socket handling code (i.e. libc or the kernel) or something in 
libcrypto which is causing BIO_free() to close the wrong descriptor from 
time to time.

Someone said this doesn't happen if you change from UNIX domain sockets to 
TCP sockets.  Has this also been tried?

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
dkim-milter-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dkim-milter-discuss

Reply via email to