Murray S. Kucherawy --> dkim-milter-discuss (2008-07-08 16:03:28 -0700):
> On Fri, 23 May 2008, Jukka Salmi wrote:
> > Sure, thanks. With this patch applied, I see mi_wr_cmd() fail with 
> > EBADF. Log is [1]available.
> >
> > Regards, Jukka
> >
> > [1] http://salmi.ch/~jukka/dkim-milter/maillog_20080522
> 
> Yep, this confirms the previous findings.  Everything is fine with the I/O 
> between postfix and the filter until:
> 
> - postfix sends SMFIC_BODYEOM (end-of-message) to the filter and waits for 
> a reply
> 
> - postfix immediately decides the wait for the reply has failed (though 
> the "why" remains a mystery), shuts down its connection to the filter and 
> temp-fails the message
> 
> - dkim-filter still thinks the connection is there, so it tries to send an 
> SMFIR_INSHEADER (insert header) request, which fails because the socket is 
> actually no longer open
> 
> - since the insert header request fails, it replies with SMFIR_TEMPFAIL to 
> try to get the message to temp-fail, but this also fails since the socket 
> is no longer open
> 
> We know the second write returns with EBADF, meaning the descriptor has 
> been closed from the filter side.  If it were the postfix side closing the 
> connection, we'd be seeing EPIPE instead of EBADF.
> 
> It looks a lot like fd 8 in the dkim-filter process has suddenly become 
> invalid for no apparent reason.  There's no path that I can see in 
> libmilter's source code to having that descriptor closed and yet 
> continuing to try to use it.  dkim-filter doesn't have access to the 
> milter context structure in order to get access to that descriptor number, 
> so for it to be the problem it would have to call close() someplace on the 
> wrong descriptor number.  However, neither libdkim nor dkim-filter ever 
> close() anything in normal operation because there's no need to do so. 
> libdkim only creates (and later closes, via libcrypto) temporary files for 
> certain special circumstances, and your configuration doesn't appear to be 
> using any of those.
> 
> So, for the moment, I'm stumped.  My best guesses now are a bug in the 
> underlying socket handling code (i.e. libc or the kernel) or something in 
> libcrypto which is causing BIO_free() to close the wrong descriptor from 
> time to time.

The systems in question will get an OS upgrade in the next weeks /
months. Let's see if the milter problem disappears with that upgrade...


> Someone said this doesn't happen if you change from UNIX domain sockets to 
> TCP sockets.  Has this also been tried?

Yes; I was originally using UNIX domain sockets when I first saw the
problem but switched to TCP sockets then to be able to capture the
packets. I haven't switched back since...


Regards, Jukka

-- 
bashian roulette:
$ ((RANDOM%6)) || rm -rf ~

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
dkim-milter-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dkim-milter-discuss

Reply via email to