[Bug 7021] TxRep - suggested replacement of Mail::SpamAssassin::Plugin::AWL

bugzilla-daemon Mon, 10 Mar 2014 18:14:28 -0700

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7021


--- Comment #9 from Ivo Truxa <[email protected]> ---
(In reply to RW from comment #8)
> IMO there ought to be a cap on the aggregate score contributed by DOMAIN,
> HELO and IP to prevent the site-based reputations becoming a poison-pill. As
> things stand a single first email scoring 27 would practically guarantee 
> that the next email received from that server would be classified as spam.

Yes, that's a good remark. And this was also my concern when designing the
plugin. Though adding caps could be done also, I used a different approach. The
algorithm for counting the final score differs slightly from AWL. At AWL, the
score fix will be always the same for given mean score, regardless of the count
of the recorded messages. At TxRep higher counts will have bigger impact than
low counts. So at a single recorded message in the entry, the result at TxRep
with the factor 0.5 will be equivalent to AWL with factor 0.25 (compare the
factor algorithms at http://truxoft.com/resources/txrep.htm#txrep_factor and at
http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin_AWL.html#user_preferences)

It is also necessary to take in view that the default influence of these three
identificators (DOMAIN, IP, HELO) is rather limited, and I doubt that they
would lead to easy misclassifications, except of quite borderline cases.

The influence of the domain entry is ~10% (default weight 2.0 of total 19.5),
the IP twice that (weight 4), and the HELO has currently the default weight of
only 0.5 (could be probably a bit more, though). Then you have the global
txrep_factor, and the aging factor, so the total impact will not be that big.

Additionally, theses three entries do not usually refer to the same machine.
While the domain can be indeed often associated with a single server, the HELO
comes from the end of the chain, and is ignored if it is equal to the IP or
domain name (or derived from them) - it is there to catch the NetBIOS name of
the enduser, if possible. Not very reliable, but can for example help to
pinpoint an infected node of a botnet, or some specific good senders as well.
The possibility of forging was intentionally ignored here. It cannot lead to
much profit, and a spammer would have hard time to figure out what enduser's
HELO names have good reputation on given server. Some HELO names are quite
unique, others quite generic, and sometimes the enduser HELO is not available
at all.

The IP was originally (at AWL, and at the first release of TxRep) designed to
be the last public IP (hence also rather the sending client machine, not the
MTA), but because of the possibility of forging, it had to be changed to the
1st untrusted IP address. So it can then indeed sometimes (or even often) point
to the same machine where the domain resides. The necessity to avoid the
forging and not to use the last public IP was unfortunate, and quite changed
the original purpose of the IP identificator. So with this change, the IP and
Domain became more redundant than previewed, hence perhaps we could also reduce
their default weights now to avoid their added effect.

Ivo

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7021] TxRep - suggested replacement of Mail::SpamAssassin::Plugin::AWL

Reply via email to