https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7021
--- Comment #9 from Ivo Truxa <[email protected]> --- (In reply to RW from comment #8) > IMO there ought to be a cap on the aggregate score contributed by DOMAIN, > HELO and IP to prevent the site-based reputations becoming a poison-pill. As > things stand a single first email scoring 27 would practically guarantee > that the next email received from that server would be classified as spam. Yes, that's a good remark. And this was also my concern when designing the plugin. Though adding caps could be done also, I used a different approach. The algorithm for counting the final score differs slightly from AWL. At AWL, the score fix will be always the same for given mean score, regardless of the count of the recorded messages. At TxRep higher counts will have bigger impact than low counts. So at a single recorded message in the entry, the result at TxRep with the factor 0.5 will be equivalent to AWL with factor 0.25 (compare the factor algorithms at http://truxoft.com/resources/txrep.htm#txrep_factor and at http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin_AWL.html#user_preferences) It is also necessary to take in view that the default influence of these three identificators (DOMAIN, IP, HELO) is rather limited, and I doubt that they would lead to easy misclassifications, except of quite borderline cases. The influence of the domain entry is ~10% (default weight 2.0 of total 19.5), the IP twice that (weight 4), and the HELO has currently the default weight of only 0.5 (could be probably a bit more, though). Then you have the global txrep_factor, and the aging factor, so the total impact will not be that big. Additionally, theses three entries do not usually refer to the same machine. While the domain can be indeed often associated with a single server, the HELO comes from the end of the chain, and is ignored if it is equal to the IP or domain name (or derived from them) - it is there to catch the NetBIOS name of the enduser, if possible. Not very reliable, but can for example help to pinpoint an infected node of a botnet, or some specific good senders as well. The possibility of forging was intentionally ignored here. It cannot lead to much profit, and a spammer would have hard time to figure out what enduser's HELO names have good reputation on given server. Some HELO names are quite unique, others quite generic, and sometimes the enduser HELO is not available at all. The IP was originally (at AWL, and at the first release of TxRep) designed to be the last public IP (hence also rather the sending client machine, not the MTA), but because of the possibility of forging, it had to be changed to the 1st untrusted IP address. So it can then indeed sometimes (or even often) point to the same machine where the domain resides. The necessity to avoid the forging and not to use the last public IP was unfortunate, and quite changed the original purpose of the IP identificator. So with this change, the IP and Domain became more redundant than previewed, hence perhaps we could also reduce their default weights now to avoid their added effect. Ivo -- You are receiving this mail because: You are the assignee for the bug.
