https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8075
--- Comment #3 from Bill Cole <billc...@apache.org> --- (In reply to nehav from comment #0) > Created attachment 5857 [details] > Daily abuse feed for the concerned tlds from Spamhaus and Surbl > > Issue: .online, .site, and .fun are listed in SpamAssassin's suspicious TLD > list. This leads to important business emails being marked as spam and being > delivered to customers. A claim which is not supported by any evidence that you have provided. Please note that to the best of the knowledge of the SpamAssassin Project Management Committee, none of the very large mailbox providers (MS, Oath (Yahoo/AOL), Google, GMX, etc.) use SpamAssassin to classify spam. If you have a problem with any of them, there is nothing we do can help. It is NOT a "false positive" for a non-spam message to match rules with derogatory (arithmetically positive) scores. It is also not a "false negative" for spam messages to match some complimentary (negative score) rules. By design, all messages should hit some of both. What matters is the final score. The default and recommended threshold for SA is a score of 5.0, and our rule QA and rescoring system is built on the premise of that threshold. > Details: I request that .online, .site, and .fun be dropped from > SpamAssassin's suspicious/untrustworthy TLD list. The abuse rates on these > three TLDs have been significantly low in the past couple of months. (Please > see the attached Spamhaus statistics. These are direct feeds we receive from > Spamhaus every day. It is evident from the data that all three TLDs have > shown considerable and continuous improvement in their rankings and abuse > rate). Irrelevant. TLDs are included in our list of suspicious TLDs based on how much of the mail using them is spam, according to the mass-check logs scored by our contributors. The three metrics from Spamhaus in that spreadsheet are not defined in any way (and so are literally meaningless to me,) but in the past I know that they have tracked TLDs based on how much of the overall spam flow uses them, which is not at all relevant to a filtering tool like SpamAssassin. > However, for the last couple of years, several clients using these > TLDs have complained that their emails are not getting delivered to their > customers. One of the potential reasons behind this could be that many of > these emails are being marked as spam by Apache SpamAssassin, leading to > some profound business loss. Hypothetical. It is literally impossible for a SpamAssassin instance with default scores and threshold to mark a message as spam solely on the use of a suspicious TLD. Obviously it is possible for a message to exceed the threshold score by less than the value of the TLD rules, but there are always other factors contributing to that score. > Can SpamAssassin please set up a test rule for .site, .online and .fun > individually to check their S/O (spam hits/overall hits)? Also, it would be > great if the TLDs were dropped from the list should the S/O for each TLD > turns out to be lower than the overall S/O of the aggregate rule. Sidney has added the .site and fun rules, the .online rule (T_SCC_TLD_ONLINE) has been there for a while and consistently has S/O scores around 95%. That one is a keeper, at least for now. So far (2 days, insufficient for a decision) it is looking like .fun is still all spam, while .site is mostly not. Unless the next few days show very different results, I anticipate that we will be removing .site but leaving .fun for now. > A similar bug had been raised for .space TLD in Bug 7953, where after the > test, it was found that S/O for .space was lower, and thus, the TLD was > eventually dropped from the suspicious TLD list. Please see bug 7953, > comment 8. I would be happy to provide other supporting data/information to > help with the tests. Real "ham" messages that score over 5 on a default-score SA instance would substantially enhance my concern with the concept of 'suspicious TLDs.' I know that linking the URL or email domain TLD to a spam score should not work at all in an ideal world, but in fact it is a helpful tactic according to the data we have. Obviously we don't want TLDs listed which no longer correlate to mail spamminess. It is not the purpose of our rules to incur punishment, only to make the best guess possible about whether a message is spam. I will look again at the stats after a week which does not include a US holiday. -- You are receiving this mail because: You are the assignee for the bug.