[Bug 8075] Request for .site, .online, and .fun to be dropped from SpamAssassin's suspicious TLD list

bugzilla-daemon Sat, 26 Nov 2022 14:38:21 -0800

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8075


--- Comment #3 from Bill Cole <billc...@apache.org> ---
(In reply to nehav from comment #0)
> Created attachment 5857 [details]
> Daily abuse feed for the concerned tlds from Spamhaus and Surbl
> 
> Issue: .online, .site, and .fun are listed in SpamAssassin's suspicious TLD
> list. This leads to important business emails being marked as spam and being
> delivered to customers. 

A claim which is not supported by any evidence that you have provided. Please
note that to the best of the knowledge of the SpamAssassin Project Management
Committee, none of the very large mailbox providers (MS, Oath (Yahoo/AOL),
Google, GMX, etc.) use SpamAssassin to classify spam. If you have a problem
with any of them, there is nothing we do can help. 

It is NOT a "false positive" for a non-spam message to match rules with
derogatory (arithmetically positive) scores. It is also not a "false negative"
for spam messages to match some complimentary (negative score) rules. By
design, all messages should hit some of both. What matters is the final score.
The default and recommended threshold for SA is a score of 5.0, and our rule QA
and rescoring system is built on the premise of that threshold. 

> Details: I request that .online, .site, and .fun be dropped from
> SpamAssassin's suspicious/untrustworthy TLD list. The abuse rates on these
> three TLDs have been significantly low in the past couple of months. (Please
> see the attached Spamhaus statistics. These are direct feeds we receive from
> Spamhaus every day. It is evident from the data that all three TLDs have
> shown considerable and continuous improvement in their rankings and abuse
> rate). 

Irrelevant. TLDs are included in our list of suspicious TLDs based on how much
of the mail using them is spam, according to the mass-check logs scored by our
contributors. The three metrics from Spamhaus in that spreadsheet are not
defined in any way (and so are literally meaningless to me,) but in the past I
know that they have tracked TLDs based on how much of the overall spam flow
uses them, which is not at all relevant to a filtering tool like SpamAssassin.  

> However, for the last couple of years, several clients using these
> TLDs have complained that their emails are not getting delivered to their
> customers. One of the potential reasons behind this could be that many of
> these emails are being marked as spam by Apache SpamAssassin, leading to
> some profound business loss.

Hypothetical. It is literally impossible for a SpamAssassin instance with
default scores and threshold to mark a message as spam solely on the use of a
suspicious TLD. Obviously it is possible for a message to exceed the threshold
score by less than the value of the TLD rules, but there are always other
factors contributing to that score.

> Can SpamAssassin please set up a test rule for .site, .online and .fun
> individually to check their S/O (spam hits/overall hits)? Also, it would be
> great if the TLDs were dropped from the list should the S/O for each TLD
> turns out to be lower than the overall S/O of the aggregate rule.

Sidney has added the .site and fun rules, the .online rule (T_SCC_TLD_ONLINE)
has been there for a while and consistently has S/O scores around 95%. That one
is a keeper, at least for now. 

So far (2 days, insufficient for a decision) it is looking like .fun is still
all spam, while .site is mostly not. Unless the next few days show very
different results, I anticipate that we will be removing .site but leaving .fun
for now. 

> A similar bug had been raised for .space TLD in Bug 7953, where after the
> test, it was found that S/O for .space was lower, and thus, the TLD was
> eventually dropped from the suspicious TLD list. Please see bug 7953,
> comment 8. I would be happy to provide other supporting data/information to
> help with the tests.

Real "ham" messages that score over 5 on a default-score SA instance would
substantially enhance my concern with the concept of 'suspicious TLDs.' I know
that linking the URL or email domain TLD to a spam score should not work at all
in an ideal world, but in fact it is a helpful tactic according to the data we
have. 

Obviously we don't want TLDs listed which no longer correlate to mail
spamminess. It is not the purpose of our rules to incur punishment, only to
make the best guess possible about whether a message is spam. 

I will look again at the stats after a week which does not include a US
holiday.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8075] Request for .site, .online, and .fun to be dropped from SpamAssassin's suspicious TLD list

Reply via email to