[Bug 4900] Persistent blacklist not really persistent

bugzilla-daemon Thu, 15 Apr 2010 15:35:06 -0700

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4900


Adam Katz <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #12 from Adam Katz <[email protected]> 2010-04-15 18:34:31 EDT ---
(In reply to comment #4 / Bug 6032, Ian Turner 2008-12-18)
> For example, consider a message sender [email protected], whose messages
> are flagged as spam with score 30. Assume the system is configured with a
> spam threshold of 10. Finally, assume an administrator runs spamassassin
> [email protected] and that several messages are
> then recieved from this source. We will see the following behaviour:
> 
> Message       pre-AWL     post-AWL    count   totscore   Message accepted?
>                score       score
>                                         1       -100
>       1          30          -35        2        -70      TRUE
>       2          30          -20        3        -40      TRUE
>       3          30           -5        4        -10      TRUE
>       4          30           10        5         20      FALSE
>       5          30           25        6         50      FALSE
> 
> As it turns out, the --add-addr-to-whitelist command was only good for
> three messages.

I ran this test on vanilla 3.3.0 and 3.3.1 installs to verify, my numbers
differed a bit (for the worse!).  I tried it at three different
auto_whitelist_factor values; the default of 0.5 (implict and explicit, both
were the same), 0.75, and 1.0.

Tests were performed using a vanilla email with a custom rule assigning 30
points to that email's Message-ID string.  I trained SA via `spamassassin -W
<test.eml`), then successively scanned the email with `spamassassin -D
auto-whitelist <test.eml |grep score:`

The only value that changes with the auto_whitelist_factor is the post-AWL
score.  This is also the only difference my results have with Ian's in comment
4, so I'm only presenting the post-AWL scores at the three factors I tested.

My results:

          ------- AWL factor -------
Message   0.5       0.75       1.0

   1     -35       -67.5      -100
   2      -2.5     -18.75     -35
   3       8.333    -2.5      -13.333
   4      13.75      5.625    -2.5
   5      17        10.5       4
   6      19.167    13.75      8.333
   7      20.714    16.071    11.429
   8      21.875    17.812    13.75
   9      22.778    19.167    15.556

At a factor of 1.0, AWL brings the score to the previous average as
specified in the documentation, which is handy for checking the math.

Like Ian's results, my test at factor=0.5 results in the sender getting
flagged as spam on the third email following a whitelist training.  Even
at factor=1.0, there are only five emails in the clear.

Here's another view of the issue, fixed at AWL factor 0.5 but with
varying initial scores and learning as ham or spam:

             ---------------- initial score -----------------
Message   h...@30   h...@20   h...@10     s...@0   s...@-5  s...@-10

   1      -35      -40      -45          50       47.5     45
   2       -2.5    -10      -17.5        25       21.3     17.5
   3      **8.3**    0       -8.3        16.7     12.5      8.3
   4       13.8    **5**     -3.8        12.5      8.1    **3.8**
   5       17        8       -1          10        5.5      1
   6       19.2     10        0.8         8.3    **3.8**   -0.8
   7       20.7     11.4      2.1         7.1      2.5     -2.1
   8       21.9     12.5      3.1         6.3      1.6     -3.1
   9       22.8     13.3      3.9         5.6      0.8     -3.9
  10       23.5     14        4.5         5        0.3     -4.5
  11       24.1     14.5    **5**       **4.5**   -0.2     -5

The turnover counts are the notable thresholds here.  A ham scoring 30
bounces back to getting marked as spam on the third message.  Ham at 20
takes just one more.  Ham at 10 turns over on the 11th message.  I
didn't put a 5 point ham on the chart, but it's fine for quite a while
(it hits 4.0 on the 53rd message and 4.5 on the 105th).

On the spam side, a spam that somehow gets to -10 evades detection on
its fourth message.  A spam at -5 returns to the inbox on the sixth.  A
zero-scoring spam is snuffed for ten iterations, returning on the 11th.
Not on the chart, a spam scoring 2 comes back on the 17th message and a
spam scoring 4.5 dips under 6 on its 32nd, under 5.5 after 48, and gets out of
jail on its 94th.

Method:  change the value of my local rule and then run:

spamassassin --add-to-blacklist <~/Mail/test.eml >/dev/null; for a in `seq 1
105`; do spamassassin -D auto-whitelist <~/Mail/test.eml 2>&1 |sed -re
'/.*post.*score: /!d' -e "s// $a\t/"; done

(or swap --add-to-blacklist with -W)

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 4900] Persistent blacklist not really persistent

Reply via email to