https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6716

Bill Cole <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]
                   |                            |cconsult.com

--- Comment #1 from Bill Cole <[email protected]> ---
Based on logs of ~150k scans in recent months on systems handling mail for
multiple small to medium businesses, SPOOF_COM2COM and SPOOF_COM2OTH most
commonly are FPs in tandem on legitimate bounce notices (e.g. Google/Postini)
and a few other messages mentioning matching hostnames and in one case a (dumb
but real) reversed dotted domain identifier. Note that in  neither of those
types of mail is the match even of an actual URI, rather it is SA detecting
what looks like a hostname and constructing a putative canonical URI from it.
Many of these end up unnoticed because the 4.7 combined score is past our
threshold and bounce messages are rarely anticipated mail. Far less frequently,
by 2 orders of magnitude, SPOOF_COM2OTH  (very rarely in tandem with
SPOOF_COM2COM) hit on spam that only reached SA at all as a result of
exemptions from frontline protections (DNSBL's, PTR existence mandate, slow
banner, etc.) and these rules were not critical to identifying it as spam;
typically those final scores exceed 15. 

Obviously my corpus is not entirely representative, mainly in that it consists
of mail that gets past blocking that reliably rejects nearly all "bot" spam
ahead of the SA scans. But it raises a few issues:

1. Should these have such high scores? Particularly since SPOOF_COM2COM matches
are usually also SPOOF_COM2OTH matches, it seems that if SPOOF_COM2COM should
exist at all, it may need a negative score (i.e. one prominent .com is already
exempted and another clearly merits exemption) 

2. SA should not be as ambitious as it is in converting bare hostname-like
strings into URIs. To paraphrase Freud: sometimes a dotted domain string is
*just* a dotted domain string. 

3. Are these tests useful against modern spam in an environment without an
outer layer of defenses catching most of the botspam? It would be helpful if
someone with a large & recent corpus that isn't pre-cleaned could examine it in
regards to these rules to see if there's any value at all in repairing them or
if they aren't just as obsolete or redundant against the full firehose as I've
found them to be against my less phishy streams.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to