I've finally had a chance to take a look at this.
First, the patch I sent out was not quite complete. There's a
short-circuit in mi_rd_cmd() I didn't notice, which is why a couple of
the messages in the protocol were not recorded. A revised patch is
attached in case you want to try it again. However, there's enough
there to see the interaction and interpolate the missing bits.
Unfortunately, there's nothing unusual going on here. Everything on the
two machines is identical right down to the point where they both notice
that the Authentication-Results: header can't be parsed, and this
appears in the logs.
We know from the protocol that postfix in both cases sent down the
SMFIC_BODYEOB, or end-of-message, instruction to the filter (although my
patch failed to log this). The parsing of Authentication-Results: only
happens when that step has already happened, so we know things are
symmetric up to that point.
On "grouper", where things proceeded to completion, the filter made one
header insertion request (that's what "mi_wr_cmd(105) on fd 4" means),
which postfix accepted. I'm presuming the configurations are also
symmetric, so this was the insertion of an X-DKIM: header (since that
failed on the other machine). Since that happens after any possible
insertion of an Authentication-Results: header, we know the filter was
done with the message at that point and was prepared to accept the
message. We can see that too because the filter sent back an "accept"
instruction (hence the "mi_wr_cmd(97) on fd 4" entry), and the session
completed.
Over on "urchin", we saw the common "failed to parse
Authentication-Results:" (which can happen for a bunch of valid
reasons). Immediately after that though, postfix decided it had
received EOF or something else from the filter; unfortunately "Undefined
error: 0" isn't very descriptive. However the filter didn't die at this
point because we see later log entries from the same process ID, and in
fact it still tries to talk to postfix. The filter doesn't detect that
this I/O problem has occurred until it tries to send the insert header
instruction for X-DKIM:, which fails because postfix has already stopped
listening. It then tries to issue a temp-fail instruction to the filter
(the "mi_wr_cmd(116) on fd 8" line) but this also fails for the same reason.
I downloaded the postfix source to poke around a bit. The specific
error "can't read SMFIC_BODYEOB reply packet header: Undefined error: 0"
is caused by two things: (a) postfix tried to read four bytes from the
filter which are the length of a reply message (which might be an
"insert header" instruction or just an "accept" result), but didn't get
four bytes. The "Undefined error: 0" means nothing was stored in the
global variable "errno" to indicate a major I/O problem, which suggests
to me that the read operation returned something less (or more) than
four. A zero would mean end-of-file, as in the filter had disconnected,
but we know that's not the case since the filter tried later to send an
"insert header" and later a "temp-fail" instruction. The milter
function that does writes from the filter to postfix, mi_wr_cmd(),
always sends no fewer than five bytes (four length bytes and an
instruction byte, followed by optional parameters).
So for the moment, assuming the operating system's I/O routines are
stable and working, I'm inclined to think the place to look would be
postfix's vbuf functions, perhaps vbuf_read(). I realize the possible
futility of that conclusion -- if postfix's I/O routines had problems,
they'd probably be more obvious than just the occasional milter I/O
issue -- but right now that's as much time as I have to look into it.
Maybe someone with more postfix internals knowledge than I do can take
it from here.
Index: comm.c
===================================================================
RCS file: /cvs/libmilter/comm.c,v
retrieving revision 8.67
diff -u -r8.67 comm.c
--- comm.c 2 Nov 2006 17:54:44 -0000 8.67
+++ comm.c 20 May 2008 16:48:59 -0000
@@ -133,7 +133,10 @@
(void) memcpy((void *) &expl, (void *) &(data[0]), MILTER_LEN_BYTES);
expl = ntohl(expl) - 1;
if (expl <= 0)
+ {
+ smi_log(SMI_LOG_INFO, "mi_rd_cmd(%d) on fd %d", *cmd, sd);
return NULL;
+ }
if (expl > Maxdatasize)
{
*cmd = SMFIC_TOOBIG;
@@ -197,6 +200,10 @@
/* makes life simpler for common string routines */
buf[expl] = '\0';
#endif /* _FFR_ADD_NULL */
+
+ smi_log(SMI_LOG_INFO, "mi_rd_cmd(%d) on fd %d", *cmd,
+ sd);
+
return buf;
}
i += len;
@@ -353,6 +360,8 @@
iovcnt = 2;
}
+ smi_log(SMI_LOG_INFO, "mi_wr_cmd(%d) on fd %d", cmd, sd);
+
l = retry_writev(sd, iov, iovcnt, timeout);
if (l == MI_FAILURE)
return MI_FAILURE;
Index: engine.c
===================================================================
RCS file: /cvs/libmilter/engine.c,v
retrieving revision 8.162
diff -u -r8.162 engine.c
--- engine.c 27 Feb 2008 01:34:14 -0000 8.162
+++ engine.c 20 May 2008 16:48:59 -0000
@@ -718,7 +718,13 @@
0);
}
break;
+ case _SMFIS_KEEP:
+ case _SMFIS_FAIL:
+ break;
default: /* don't send a reply */
+ smi_log(SMI_LOG_ERR,
+ "%s: milter returned bogus status %d in state %d\n",
+ ctx->ctx_smfi->xxfi_name, r, ctx->ctx_state);
break;
}
return ret;
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
dkim-milter-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dkim-milter-discuss