https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6203

--- Comment #10 from Mark Martinec <[email protected]> 2009-09-29 12:40:11 
PDT ---
> Not guessing. I'm exposing a fact about how AWL is supposed to work.
> Do you have numbers to say /24 is better than /16?

My main concern is overlap, which every once in a while cause a
huge false positive score, and (at least for me) is the main
reason that AWL is given bad name.

I collected two such samples from our SQL-based AWL from the last
week's worth of data. I'm keeping /24 in the database now:

       email        |     ip      |   avg   | count

 [email protected] | 77.126.81   | -4.9745 |     6
 [email protected] | 77.126.168  |  68.461 |     1

 [email protected] | 194.249.166 |    56.7 |     1
 [email protected] | 194.249.231 | -3.5125 |     4

In the first case, both the 77.126.81 and the 77.126.168 are allocated
to the same large ISP. As it happened, a botnet-ized PC in the same /16
network happened to chose a sender address of our user, which wreaked havoc
on its average score. Luckily we keep awl factor low.

In the second case, the 194.249.231.0/24 is allocated to one organization,
while 194.249.166.0/26 is given to some grammar school. Again, the same
happened, an infected PC happened to send a blatant spam on behalf of our
user from 194.249.231.0/24, polluting average for our-user-2/194.249

I recognize the boundary choice is rather arbitrary. I'm just saying that
/16 is too often too wide. Needing a little bit more storage is not a concern,
as there are thousands of one-time addresses already kept in AWL anyway.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to