On 03/02/2016 09:31 AM, Jari Fredriksson wrote:

Hello.

When I look (hover over) my corpus in http://ruleqa.spamassassin.org it
shows that I have some 2000 spam in my corpus.

However, as my last run shows there should more, 34140:

+ nosleep ./mass-check --hamlog=ham-jarif.log --spamlog=spam-jarif.log
-j 7 --progress --reuse ham:dir:/home/jarif/Maildir/.Confirmed-HAM
spam:dir:/home/jarif/Maildir/.Confirmed-SPAM
status: starting scan stage                              now: 2016-03-02
01:03:59
status: completed scan stage, 96050 messages             now: 2016-03-02
01:16:55
status: starting run stage                               now: 2016-03-02
01:16:55
status:  10% ham: 5816   spam: 3790   date: 2013-05-30   now: 2016-03-02
01:25:03
status:  20% ham: 11620  spam: 7592   date: 2013-10-01   now: 2016-03-02
01:34:54
status:  30% ham: 17448  spam: 11370  date: 2014-01-17   now: 2016-03-02
01:42:40
status:  40% ham: 23271  spam: 15153  date: 2014-06-02   now: 2016-03-02
01:52:23
status:  50% ham: 29070  spam: 18960  date: 2015-08-14   now: 2016-03-02
02:03:19
status:  60% ham: 34873  spam: 22763  date: 2015-02-02   now: 2016-03-02
02:14:21
status:  70% ham: 40689  spam: 26553  date: 2015-05-19   now: 2016-03-02
02:26:57
status:  80% ham: 46508  spam: 30340  date: 2015-11-13   now: 2016-03-02
02:38:09
status:  90% ham: 52314  spam: 34140  date: 2015-12-12   now: 2016-03-02
02:48:36
status: completed run stage                              now: 2016-03-02
02:57:49
+ LOGLIST=' ham-jarif.log spam-jarif.log'
+ set +x
Syncing nightly_mass_check
rsync -Pcqz  ham-jarif.log spam-jarif.log *munged*/

So, how does that work? Current masscheck is short of spam badly...

My suggestion:

- run separate masscheck jobs for ham / spam - makes it easier to debug.

- remove old stuff from corpus. Anything older than 1 year is next to pointless and can even make bad decisions as for example, X-headers, change over time.

- be sure to run masscheck jobs within the defined time window.

Axb



Reply via email to