On 03/02/2016 09:31 AM, Jari Fredriksson wrote:
Hello.
When I look (hover over) my corpus in http://ruleqa.spamassassin.org it
shows that I have some 2000 spam in my corpus.
However, as my last run shows there should more, 34140:
+ nosleep ./mass-check --hamlog=ham-jarif.log --spamlog=spam-jarif.log
-j 7 --progress --reuse ham:dir:/home/jarif/Maildir/.Confirmed-HAM
spam:dir:/home/jarif/Maildir/.Confirmed-SPAM
status: starting scan stage now: 2016-03-02
01:03:59
status: completed scan stage, 96050 messages now: 2016-03-02
01:16:55
status: starting run stage now: 2016-03-02
01:16:55
status: 10% ham: 5816 spam: 3790 date: 2013-05-30 now: 2016-03-02
01:25:03
status: 20% ham: 11620 spam: 7592 date: 2013-10-01 now: 2016-03-02
01:34:54
status: 30% ham: 17448 spam: 11370 date: 2014-01-17 now: 2016-03-02
01:42:40
status: 40% ham: 23271 spam: 15153 date: 2014-06-02 now: 2016-03-02
01:52:23
status: 50% ham: 29070 spam: 18960 date: 2015-08-14 now: 2016-03-02
02:03:19
status: 60% ham: 34873 spam: 22763 date: 2015-02-02 now: 2016-03-02
02:14:21
status: 70% ham: 40689 spam: 26553 date: 2015-05-19 now: 2016-03-02
02:26:57
status: 80% ham: 46508 spam: 30340 date: 2015-11-13 now: 2016-03-02
02:38:09
status: 90% ham: 52314 spam: 34140 date: 2015-12-12 now: 2016-03-02
02:48:36
status: completed run stage now: 2016-03-02
02:57:49
+ LOGLIST=' ham-jarif.log spam-jarif.log'
+ set +x
Syncing nightly_mass_check
rsync -Pcqz ham-jarif.log spam-jarif.log *munged*/
So, how does that work? Current masscheck is short of spam badly...
My suggestion:
- run separate masscheck jobs for ham / spam - makes it easier to debug.
- remove old stuff from corpus. Anything older than 1 year is next to
pointless and can even make bad decisions as for example, X-headers,
change over time.
- be sure to run masscheck jobs within the defined time window.
Axb