https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7965

            Bug ID: 7965
           Summary: SQL storage backend miscalculates mean score for AWL
                    (and others?)
           Product: Spamassassin
           Version: 3.4.6
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: Learner
          Assignee: dev@spamassassin.apache.org
          Reporter: mnalis-sa...@voyager.hr
  Target Milestone: Undefined

When using AWL with default BerkleyDB storage backend, feeding the same mail
over and over again does not change its score from the mean, which is correct
behaviour.

However, when I use SQL to store the results instead,this behaviour changes in
incorrect way, causing the score to be further and further reduced on each new
pass. 

E.g. when nuking AWL storage and starting from scratch, the following mail has
score 16.641, and I can feed it many times and the score remains the same with
default BerkleyDB storage:

> Mar 27 19:17:57.039 [4227] dbg: auto-whitelist: db-based 
> adminoff...@unist.hr|ip=170.246 scores 0/0
> Mar 27 19:17:57.039 [4227] dbg: auto-whitelist: db-based 
> adminoff...@unist.hr|ip=none scores 0/0
> Mar 27 19:17:57.040 [4227] dbg: auto-whitelist: AWL active, pre-score: 
> 16.641, autolearn score: 16.641, mean: undef, IP: 170.246.172.8, address: 
> adminoff...@unist.hr signed by hotelfrontera.c
> Mar 27 19:17:57.040 [4227] dbg: auto-whitelist: add_score: new count: 1, new 
> totscore: 16.641
> Mar 27 19:17:57.043 [4227] dbg: auto-whitelist: post auto-whitelist score: 
> 16.641
> 
> Mar 27 19:18:07.542 [4285] dbg: auto-whitelist: db-based 
> adminoff...@unist.hr|ip=170.246 scores 1/16.641
> Mar 27 19:18:07.542 [4285] dbg: auto-whitelist: AWL active, pre-score: 
> 16.641, autolearn score: 16.641, mean: 16.641, IP: 170.246.172.8, address: 
> adminoff...@unist.hr signed by hotelfrontera.
> Mar 27 19:18:07.543 [4285] dbg: auto-whitelist: add_score: new count: 2, new 
> totscore: 33.282
> Mar 27 19:18:07.546 [4285] dbg: auto-whitelist: post auto-whitelist score: 
> 16.641
> 
> Mar 27 19:18:18.406 [4342] dbg: auto-whitelist: db-based 
> adminoff...@unist.hr|ip=170.246 scores 2/33.282
> Mar 27 19:18:18.406 [4342] dbg: auto-whitelist: AWL active, pre-score: 
> 16.641, autolearn score: 16.641, mean: 16.641, IP: 170.246.172.8, address: 
> adminoff...@unist.hr signed by hotelfrontera.
> Mar 27 19:18:18.407 [4342] dbg: auto-whitelist: add_score: new count: 3, new 
> totscore: 49.923
> Mar 27 19:18:18.410 [4342] dbg: auto-whitelist: post auto-whitelist score: 
> 16.641

However when I nuke AWL storage and starting from scratch with SQL backend (by
using "auto_whitelist_factory Mail::SpamAssassin::SQLBasedAddrList"), the
following mail which has score 16.641 initially, changes (reduces) the score on
each subseqent invocation, e.g.:

> Mar 27 19:12:47.675 [2717] dbg: auto-whitelist: sql-based 
> adminoff...@unist.hr|hotelfrontera.cl scores 0, msgcount 0
> Mar 27 19:12:47.676 [2717] dbg: auto-whitelist: sql-based 
> adminoff...@unist.hr|none scores 0, msgcount 0
> Mar 27 19:12:47.676 [2717] dbg: auto-whitelist: AWL active, pre-score: 
> 16.641, autolearn score: 16.641, mean: undef, IP: 170.246.172.8, address: 
> adminoff...@unist.hr signed by hotelfrontera.c
> Mar 27 19:12:47.678 [2717] dbg: auto-whitelist: sql-based add_score/insert 
> score 16.641: amavis|adminoff...@unist.hr|170.246|1|16.641|hotelfrontera.cl
> Mar 27 19:12:47.679 [2717] dbg: auto-whitelist: post auto-whitelist score: 
> 16.641
> 
> Mar 27 19:13:37.620 [2815] dbg: auto-whitelist: sql-based 
> adminoff...@unist.hr|hotelfrontera.cl scores 16.641, msgcount 1
> Mar 27 19:13:37.620 [2815] dbg: auto-whitelist: AWL active, pre-score: 
> 16.641, autolearn score: 16.641, mean: 16.641, IP: 170.246.172.8, address: 
> adminoff...@unist.hr signed by hotelfrontera.
> Mar 27 19:13:37.623 [2815] dbg: auto-whitelist: sql-based add_score/insert 
> score 16.641: amavis|adminoff...@unist.hr|170.246|1|16.641|hotelfrontera.cl
> Mar 27 19:13:37.624 [2815] dbg: auto-whitelist: post auto-whitelist score: 
> 16.641
> 
> Mar 27 19:13:48.304 [2873] dbg: auto-whitelist: sql-based 
> adminoff...@unist.hr|hotelfrontera.cl scores 16.641, msgcount 2
> Mar 27 19:13:48.304 [2873] dbg: auto-whitelist: AWL active, pre-score: 
> 16.641, autolearn score: 16.641, mean: 8.320, IP: 170.246.172.8, address: 
> adminoff...@unist.hr signed by hotelfrontera.c
> Mar 27 19:13:48.308 [2873] dbg: auto-whitelist: sql-based add_score/insert 
> score 16.641: amavis|adminoff...@unist.hr|170.246|1|16.641|hotelfrontera.cl
> Mar 27 19:13:48.309 [2873] dbg: auto-whitelist: post auto-whitelist score: 
> 12.481
> 
> Mar 27 19:13:58.940 [2933] dbg: auto-whitelist: sql-based 
> adminoff...@unist.hr|hotelfrontera.cl scores 16.641, msgcount 3
> Mar 27 19:13:58.940 [2933] dbg: auto-whitelist: AWL active, pre-score: 
> 16.641, autolearn score: 16.641, mean: 5.547, IP: 170.246.172.8, address: 
> adminoff...@unist.hr signed by hotelfrontera.c
> Mar 27 19:13:58.942 [2933] dbg: auto-whitelist: sql-based add_score/insert 
> score 16.641: amavis|adminoff...@unist.hr|170.246|1|16.641|hotelfrontera.cl
> Mar 27 19:13:58.943 [2933] dbg: auto-whitelist: post auto-whitelist score: 
> 11.094
> 
> Mar 27 19:14:09.724 [3031] dbg: auto-whitelist: sql-based 
> adminoff...@unist.hr|hotelfrontera.cl scores 16.641, msgcount 4
> Mar 27 19:14:09.724 [3031] dbg: auto-whitelist: AWL active, pre-score: 
> 16.641, autolearn score: 16.641, mean: 4.160, IP: 170.246.172.8, address: 
> adminoff...@unist.hr signed by hotelfrontera.c
> Mar 27 19:14:09.727 [3031] dbg: auto-whitelist: sql-based add_score/insert 
> score 16.641: amavis|adminoff...@unist.hr|170.246|1|16.641|hotelfrontera.cl
> Mar 27 19:14:09.729 [3031] dbg: auto-whitelist: post auto-whitelist score: 
> 10.401

This should not be happening; the score should remain the same regardless if
ones uses SQL od BDB storage.

I'm using spamassassin package 3.4.6-1 from Debian Bullseye.

Note: I highly suspect this also affects other learners using SQL, for example
TxRep (https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7943), and maybe
bayes too?

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to