Whitelists, not directly useful to spamassassin...

Warren Togami Wed, 16 Dec 2009 15:10:49 -0800

I made a discovery today that surprised even myself. Using the rescoremasscheck and weekly masscheck logs while working on Bug #6247 I foundsome interesting details that throws a wrench into this lively debate.


https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6247#c49
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6247#c51

It turns out that the ReturnPath and DNSWL whitelists have astatistically insignificant impact on spamassassin's ability todetermine ham vs. spam. Meanwhile, both whitelists have high levels ofaccuracy.

How can both of these statements be true? I suspect this is because thescores are balanced by the rescoring algorithm to be "safe" in themajority case where no whitelist rule has triggered. Thus whitelistsare not needed or relied upon to prevent false positive classification.

While whitelists are not directly effective (statistically, whenaveraged across a large corpus), whitelists are powerful tools inindirect ways including:

* Pushing the score beyond the auto-learn threshold for things likeBayes to function without manual intervention.* The albeit controversial method where some automated spam trapblacklists use whitelists to help determine if they really should listan IP address.


https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6247
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6251

spamassassin-3.3.0 has reduced the score impact of these whitelists tomore modest levels, maxing out at -5 points. -5 is PLENTY forspamassassin, as 5 points is the level which the scoreset is tuned.Mail from a whitelisted host would need greater than 10 points to beblocked, which is statistically very rare for ham. I believe that weare striking the right balance with these modest whitelist scores inthis release.

That being said, whitelists should be constantly policed to maintaintheir reputation and trust levels. For example, while I currently amimpressed by DNSWL's performance, I am not pleased that they seem tolack automated trap-based enforcement. Relying only on manual reportsand manual intervention requires too much effort in the long-term forany organization, be it company or volunteer run.


Warren Togami
wtog...@redhat.com

Whitelists, not directly useful to spamassassin...

Reply via email to