https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6273

           Summary: Logging Enhancement Request
           Product: Spamassassin
           Version: 3.2.5
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: spamc/spamd
        AssignedTo: [email protected]
        ReportedBy: [email protected]


Hello all:

(Note I've brought this up on the mailing list, and per a request from John
Hardin am mostly reiterating it here, although I'm breaking it down to several
possible solutions which may be better filed as individual enhancement
requests).

The output of SA's logging is hard to parse from a syslog point of view. 
Running in the most full debug mode, for a single message I get:

Jan  4 21:51:48 quark spamd[19770]: spamd: connection from prime.gushi.org
[72.9.101.130] at port 55208

Jan  4 21:51:49 quark spamd[19770]: spamd: processing message
<[email protected]> for danial:58

Jan  4 21:51:50 quark spamd[19770]: FuzzyOcr: Processing Message with ID
"<[email protected]>" (VIAGRA (c) Best Supplier
<[email protected]> -> [email protected])

Jan  4 21:51:50 quark spamd[19770]: spamd: identified spam (25.8/5.0) for
danial:58 in 1.3 seconds, 4239 bytes.

Jan  4 21:51:50 quark spamd[19770]: spamd: result: Y 25 -
BAYES_99,DCC_CHECK,DIGEST_MULTIPLE,HTML_IMAGE_ONLY_32,HTML_IMAGE_RATIO_02,HTML_MESSAGE,KAM_BADIPHTTP,MIME_HTML_ONLY,NORMAL_HTTP_TO_IP,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RCVD_IN_SORBS_WEB,RCVD_IN_XBL,RDNS_NONE,SARE_FROM_DRUGS,SARE_UNA,URIBL_AB_SURBL,URIBL_BLACK,URIBL_SBL
scantime=1.3,size=4239,user=danial,uid=58,required_score=5.0,rhost=prime.gushi.org,raddr=72.9.101.130,rport=55208,mid=<[email protected]>,bayes=1.000000,autolearn=spam,shortcircuit=no

The problems here are:

1) Logging is not tunable, at all.  You turn on debug mode, and get more, or
less, but it's not possible to configure what goes into a given logfile line.

Think of apache's "LogFormat" command, and the flexibility it gives you:
several predefined standards, plus the ability to use syslog for (or create
your own logfiles) any other data, in any format.  As far as I know, spamd
can't log to anything other than syslog.  It's already got SQL connectivity
built in, but there's no logging to that either.

At the very least, moving the logging into a tokenized format I could specify
in local.cf would be a MAJOR improvement.

Other useful things to have as logging tokens that don't show up in any current
core logs:  Sendmail QueueID, From-Address, To-Address, Relaying IP
(SpamAssassin knows and uses all these pieces of data for various rules).

2) Correlating the above lines is hard.  Line 4 has the decimal score, but no
other lines do.  Line 4 also lacks ANY useful message id (it has a user and a
size, neither of these are unique, it has the PID, which doesn't really help at
all).

Only line 2 has the messageid, which is a chunk of data with no arbitary
format, where the sender can supply it and we'll log it (i.e. it's forgable). 
Furthermore, it is confusable with email addresses, and the <> and @ make it
even harder to parse or use with a tool like grep, PLUS it appears in different
places on every line.  It's probably the worst possible correlator to use.  

At no point does any instance of spamd log the sendmail "queue id" (which is
all one needs to grep for given any other daemon or milter.  This is easily
parsed out of any message.  In order to find that line in my mail logs, I would
need to find the following logfile line (on a different machine).

Jan  4 21:46:16 <mail.info> prime sm-mta[84403]: o052kCHR084403:
from=<[email protected]>, size=3144, class=0, nrcpts=1,
msgid=<[email protected]>, proto=ESMTP, daemon=MTA,
relay=[203.140.73.194]

And from there, the queueid (o052kCHR084403) lets me search for every other
action to occur to that message.  Spamd shouldn't be different.

At the very, very least (if the sendmail queueid isn't available) can we "make
up" a single unique messageid for all logfiles being logged for a given
message, and use that string in all loglines relating to that message (with
sendmail, this is always the bit following the pid)?  At least that way, we can
correlate matching spamd log lines, even if matching MTA lines is hard.

I guess what I'm saying here is that it's hard both to correlate spamd lines
against other daemons, and against other spamd loglines acting on the same
message.  Both of these should be a goal in logging anything.

So, to sum it up, here's the requests, as I mention them above.  Feel free to
let me know if these are too lofty, or if I should file separate enhancement
requests.

1) Configurability:

1a) Implement Flatfile Logging.

1b) Implement tokenized logging to flatfiles.

1b-a) Add tokens as above.

1c) If tokenizing isn't going to be feasible, can we please add sender,
recipient, messageid, decimal score to the Result Line?  Thirdparty plugins
(such as FuzzyOCR, above) manage this for their own logs, why doesn't core
spamd?

1d) Implement logging to SQL (with or without tokens).

2) Correlation:

2a) Log actual sendmail queuid to all loglines in the same place.  Given we may
not know this on initial connect for line 1 above.

2b) Failing 2a, make up a queueid early enough in the process based on (date,
sourceport, srcip, dstport, pid) and log that everywhere you'd log the above.

Finally,

3) Standardize on what format third-party plugins should write to the log with.

Let me know if there's anything I can clarify.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to