https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6716
Bill Cole <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] | |cconsult.com --- Comment #1 from Bill Cole <[email protected]> --- Based on logs of ~150k scans in recent months on systems handling mail for multiple small to medium businesses, SPOOF_COM2COM and SPOOF_COM2OTH most commonly are FPs in tandem on legitimate bounce notices (e.g. Google/Postini) and a few other messages mentioning matching hostnames and in one case a (dumb but real) reversed dotted domain identifier. Note that in neither of those types of mail is the match even of an actual URI, rather it is SA detecting what looks like a hostname and constructing a putative canonical URI from it. Many of these end up unnoticed because the 4.7 combined score is past our threshold and bounce messages are rarely anticipated mail. Far less frequently, by 2 orders of magnitude, SPOOF_COM2OTH (very rarely in tandem with SPOOF_COM2COM) hit on spam that only reached SA at all as a result of exemptions from frontline protections (DNSBL's, PTR existence mandate, slow banner, etc.) and these rules were not critical to identifying it as spam; typically those final scores exceed 15. Obviously my corpus is not entirely representative, mainly in that it consists of mail that gets past blocking that reliably rejects nearly all "bot" spam ahead of the SA scans. But it raises a few issues: 1. Should these have such high scores? Particularly since SPOOF_COM2COM matches are usually also SPOOF_COM2OTH matches, it seems that if SPOOF_COM2COM should exist at all, it may need a negative score (i.e. one prominent .com is already exempted and another clearly merits exemption) 2. SA should not be as ambitious as it is in converting bare hostname-like strings into URIs. To paraphrase Freud: sometimes a dotted domain string is *just* a dotted domain string. 3. Are these tests useful against modern spam in an environment without an outer layer of defenses catching most of the botspam? It would be helpful if someone with a large & recent corpus that isn't pre-cleaned could examine it in regards to these rules to see if there's any value at all in repairing them or if they aren't just as obsolete or redundant against the full firehose as I've found them to be against my less phishy streams. -- You are receiving this mail because: You are the assignee for the bug.
