https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7127
--- Comment #19 from Reindl Harald <[email protected]> --- for sure bayes_learn_to_journal is not the reason, it's more the key to get the expected final result for whatever reason at the end i am happy with the result as well i prefer that way of training and have the opportinuity to say "that is my complete corpus, rebuild bayes from scratch in a temp-folder, move it after that and reload spamd" if for whatever reason i decide later that messages are wrong classified or in the future may arrive a major upgrade with different tokenizing ________________________________________________________________ * bayes_learn_to_journal 0 * removed --no-sync from the script * commented the seperate sync call * 255 spam samples ignored Learned tokens from 9802 message(s) (10057 message(s) examined) after that i interrupted the rebuild from scratch, restored the old script and re-enabled bayes_learn_to_journal - voila - no sample ignored, all fine, see below it seems also to be some minutes faster, surely the learn without sync is way faster and then a few minutes for the sync calls itself without progress display but the total time is around 2 minutes lower for the whole corpus ________________________________________________________________ that is the complete run with bayes_learn_to_journal--no-sync and the seperate sync-call after both folders: [root@mail-gw:~]$ sa-learn.sh rebuild Replacing "Subject: [SPAM] " with "Subject: " (case sensitive) (partial words matched) Replacing "Subject: [SPAM] " with "Subject: " (case sensitive) (partial words matched) 10-02-2015 19:21:40: Verarbeite SPAM Samples 100% [==============================================================================================================================================================================] 55.72 msgs/sec 03m00s DONE Learned tokens from 10057 message(s) (10057 message(s) examined) 10-02-2015 19:24:42: Synchronisiere Journal 10-02-2015 19:27:43: Verarbeite HAM Samples 100% [==============================================================================================================================================================================] 34.97 msgs/sec 04m53s DONE Learned tokens from 10256 message(s) (10256 message(s) examined) 10-02-2015 19:32:38: Synchronisiere Journal 10-02-2015 19:36:32: Done 0.000 0 3 0 non-token data: bayes db version 0.000 0 10057 0 non-token data: nspam 0.000 0 10256 0 non-token data: nham 0.000 0 1332016 0 non-token data: ntokens 0.000 0 993467899 0 non-token data: oldest atime 0.000 0 1423592669 0 non-token data: newest atime 0.000 0 1423593159 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count insgesamt 34M -rw------- 1 sa-milt sa-milt 2,5M 2015-02-10 19:36 bayes_seen -rw------- 1 sa-milt sa-milt 41M 2015-02-10 19:36 bayes_toks -rw------- 1 sa-milt sa-milt 98 2014-08-21 17:47 user_prefs -- You are receiving this mail because: You are the assignee for the bug.
