Re: RCVD_IN_XBL score

darxus Sun, 11 Mar 2012 12:45:19 -0700

On 03/11, Axb wrote:
> If your windows box was exploited and listed in CBL for a day, and
> you submit a delisting request after you fixed , the listing will
> disappear within a couple of hours, the CBL/XBL worked as intended
> and that incident could be recorded in someone's corpus for a long
> time tho the incident has long been resolved and this would
> negatively influence the BL's score.

Those hits that remain in someone's corpus are representative of the
performance of the list.  New queries, without reuse, at the time of
running masscheck, are not representative of the accuracy of the list.

> Then you don't understand how CBL/XBL works and how this method and
> low score is breaking its strength in tagging exploited sender IPs.
> As we may use XBL to reject mail, the score should be accordingly
> high for those who chose NOT to reject yet want to get the full
> advantage of XBL's accuracy.

It doesn't matter how the lists are maintained, how false positives
get removed.  All that matters is performance at the time an email is
received, which is what reuse is for.

Using data that's 6 years old, on the other hand, is unfortunate.  I showed
how it screws up the performance analysis for dnswl, during a period when
jm's corpora were missing.

> Anybody using HAM older than 3 years should voluntarily cleanup.
> Patterns change and as with spam, HAM also goes stale.

That would prevent score regeneration from happening at all.  Because the
150,000th newest email was older, 4.6 years old, last time I checked, in
October:  https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6386#c3
(And score regeneration doesn't run with fewer than 150,000 hams.)

At that time, "29.8% of the ham currently used in score generation [was]
from 2008 or older, from jm's corpus."  I expect the current number to be
very similar.

My ham corpus only goes back about 1 year.  So the false positives I got
are well within the limit you suggest.

On 03/11, Axb wrote:
> >Sorry, but your thinking is wrong.  What Darxus says is completely correct.
> 
> How can be it be right to reuse BL hits which have probably expired
> along time ago?
> 
> To me this is like saying your credit rating at age 40 is bad coz
> you had a $5k debt at age 20
> 
> Don't understand your logic.

Well, your credit rating *does* take into account what you've done over
the last few years.  Because otherwise there isn't enough data to reliably
determine your credit score.  It records your performance at the time
*of* your performance - not what you would do now given the opportunity
to try again with the benefit of hind sight.  Which is effectively what
happens without reuse.

(And a $5k debt at any age will never give you a bad credit score.  What
will give you a bad credit score is not making the payments on time.  Even
if it's because your bank's automatic online payment crap broke.  And that
stuff sticks.)

On 03/11, Henrik Krohns wrote:
> If we are talking about _Spamhaus_ which most people have rejecting at SMTP
> time anyway, the current XBL/SBL scores are ridiculously low.
> 
> A few lame livejournal/forum mails are allowed to make one of the most
> respected lists to be less effective?

I do not disagree with this.  I think increasing the score of the
spamhause rules would be fine.  The only reason I stopped automatically
rejecting everything in zen at my MTA was to collect better data for
things like masscheck.  Funny, huh?  I wonder how many more false
positives aren't showing up in masscheck / rule QA / score generation
because the contributor never sees them due to using zen at their MTA.

Rule QA output certainly suggests we're missing that data for that reason
in several of the corpora.

On 03/11, Axb wrote:
> .......and why did these forum IPs land in XBL in the firts place?
> What exploit hit them?
> 
> May we have these IPs so we can research their history?

Reputation lists get things wrong sometimes.  Keeping them accurate
is hard.  My guess is spamhause actually just screwed up.  But most
importantly, the reason doesn't matter.  Spamhause had false positives,
they're part of the accurate record of their performance, and that record
is the best way we have to predict future performance.

-- 
"Every normal man must be tempted at times to spit upon his hands,
hoist the black flag, and begin slitting throats."
 - Henry Louis Mencken (1880-1956)
http://www.ChaosReigns.com

Re: RCVD_IN_XBL score

Reply via email to