https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7127

--- Comment #19 from Reindl Harald <[email protected]> ---
for sure bayes_learn_to_journal is not the reason, it's more the key to get the
expected final result for whatever reason 

at the end i am happy with the result as well i prefer that way of training and
have the opportinuity to say "that is my complete corpus, rebuild bayes from
scratch in a temp-folder, move it after that and reload spamd" if for whatever
reason i decide later that messages are wrong classified or in the future may
arrive a major upgrade with different tokenizing
________________________________________________________________

* bayes_learn_to_journal 0
* removed --no-sync from the script
* commented the seperate sync call
* 255 spam samples ignored

Learned tokens from 9802 message(s) (10057 message(s) examined)

after that i interrupted the rebuild from scratch, restored the old script and
re-enabled bayes_learn_to_journal - voila - no sample ignored, all fine, see
below

it seems also to be some minutes faster, surely the learn without sync is way
faster and then a few minutes for the sync calls itself without progress
display but the total time is around 2 minutes lower for the whole corpus
________________________________________________________________

that is the complete run with bayes_learn_to_journal--no-sync and the seperate
sync-call after both folders:

[root@mail-gw:~]$ sa-learn.sh rebuild
Replacing "Subject: [SPAM] " with "Subject: " (case sensitive) (partial words
matched)
Replacing "Subject: [SPAM] " with "Subject: " (case sensitive) (partial words
matched)

10-02-2015 19:21:40: Verarbeite SPAM Samples
100%
[==============================================================================================================================================================================]
 55.72 msgs/sec 03m00s DONE
Learned tokens from 10057 message(s) (10057 message(s) examined)
10-02-2015 19:24:42: Synchronisiere Journal

10-02-2015 19:27:43: Verarbeite HAM Samples
100%
[==============================================================================================================================================================================]
 34.97 msgs/sec 04m53s DONE
Learned tokens from 10256 message(s) (10256 message(s) examined)
10-02-2015 19:32:38: Synchronisiere Journal

10-02-2015 19:36:32: Done

0.000          0          3          0  non-token data: bayes db version
0.000          0      10057          0  non-token data: nspam
0.000          0      10256          0  non-token data: nham
0.000          0    1332016          0  non-token data: ntokens
0.000          0  993467899          0  non-token data: oldest atime
0.000          0 1423592669          0  non-token data: newest atime
0.000          0 1423593159          0  non-token data: last journal sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire reduction
count

insgesamt 34M
-rw------- 1 sa-milt sa-milt 2,5M 2015-02-10 19:36 bayes_seen
-rw------- 1 sa-milt sa-milt  41M 2015-02-10 19:36 bayes_toks
-rw------- 1 sa-milt sa-milt   98 2014-08-21 17:47 user_prefs

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to