you need much more spam mails target should be to get a corpusnorm of 0.9 .... 1.1 after "info: corpusnorm after processing messages/errors-spam and messages/errors-notspam
this will require an amount of ~ 5.000 spam mails in messages/errors-spam ([1] moving (e.g. older) well known spam from messages/spam to messages/errors-spam will help) It looks like most of your collected spam mails are very short. 16.000 spam and 2.200 ham resulting in a corpunorm of 0.77 -> collect at least 4.000 + [1] more spam mails. set MaxCorrectedDays very high (e.g. 10.000) - leave this for ever procedure: increase MaxFiles (e.g. to 30.000) set the first value of MaxBayesFileAge much higher until the corpusnorm is balanced (this will take some days) - than calculate the age of the oldest spam and set the first value of MaxBayesFileAge accordingly - count the files in messages/spam and set MaxFiles accordingly If the corpusnorm is fine, leave the setting for some days (be patient !!!!). Than increase MaxBytes to 8.000. This will lead in to a too low corpusnorm. Start the above procedure again. Than increase MaxBytes to 20.000. This will lead again in to a too low corpusnorm. Start the above procedure again. Every some days check the rebuild log. Small corrections for MaxBayesFileAge will help to keep everyting fine. Most times no correction will be required. If "info: corpusnorm after processing messages/errors-spam and messages/errors-notspam..." becomes too unbalanced, correct the long time corpus manually (move files)! Keep in mind: the rebuild task requires two runs after any of the above value changes, to reach the auto-self-healthy-state! Thomas Von: "K Post" <nntp.p...@gmail.com> An: "ASSP development mailing list" <assp-test@lists.sourceforge.net> Datum: 17.12.2018 16:05 Betreff: [Assp-test] Rebuild only needs 1 file from notspam? I just reviewed a rebuild llog and was shocked to see: Dec-17-18 02:25:25 info: require approximately 1 files (2 words) from folder messages/notspam to get the wanted corpusnorm (1.000) That's after the messages/spam folder (15k messages) is processed. I have maxfiles set to 15,000 maxbytes set to 4,000 Suggestions? I certainly want our users' good mail to be considered! Can't say I've seen this ever before, but I don't review the rebuild log terribly often. Copy of rebuild log: File rebuildrun.txt follows: Dec-17-18 02:15:00 RebuildSpamDB-thread rebuildspamdb-version 7.50 started in ASSP version 2.6.2(18339) Dec-17-18 02:15:00 RebuildSpamDB uses BerkeleyDB for temporary hashes Dec-17-18 02:15:00 RebuildSpamDB uses BerkeleyDB-ENV with 62.50 MByte Dec-17-18 02:15:00 RebuildSpamDB will create a Hidden Markov Model Dec-17-18 02:15:00 RebuildSpamDB will include attachment-database-entries in to spamdb Dec-17-18 02:15:00 RebuildSpamDB will create unicode enabled databases Dec-17-18 02:15:00 RebuildSpamDB will process all words as Sequence of UAX #29 Grapheme Clusters Dec-17-18 02:15:00 RebuildSpamDB will normalize unicode characters Dec-17-18 02:15:00 RebuildSpamDB will use the ASSP_WordStem engine Dec-17-18 02:15:00 ---ASSP Settings--- Dec-17-18 02:15:00 Do Not Collect Messages with RedListed address: Enabled **Messages with RedListed addresses will be removed from the corpus!** Dec-17-18 02:15:00 Do Not Collect RedRe Messages: Enabled **Messages matching the RedRe will be removed from the corpus!** Dec-17-18 02:15:00 Use Subject as Maillog Names: True Dec-17-18 02:15:00 Maxbytes: 4,000 Dec-17-18 02:15:00 Maxfiles: 15,000 Dec-17-18 02:15:00 RebuildFileTimeLimit: 1 5 Dec-17-18 02:15:00 RebuildFileTimeLimit: files will be moved away from the corpus if their processing takes longer than 5 second(s) Dec-17-18 02:15:00 Trashlist cleaning finished, 2 of 56 files deleted Dec-17-18 02:15:00 c:/ASSP/messages/errors-spam Dec-17-18 02:15:00 File Count: 934 Dec-17-18 02:15:00 Processing... messages/errors-spam with 934 files Dec-17-18 02:15:52 0 attachment/image entries processed Dec-17-18 02:15:52 Imported Files for HeloBlackList: 933 Dec-17-18 02:15:52 Imported Files for Bayes/HMM: 933 Dec-17-18 02:15:52 Finished in 52 seconds (17.94 files/s - 9.88 MByte) Dec-17-18 02:15:52 c:/ASSP/messages/errors-notspam Dec-17-18 02:15:52 File Count: 2,209 Dec-17-18 02:15:52 Processing... messages/errors-notspam with 2,209 files Dec-17-18 02:18:36 0 attachment/image entries processed Dec-17-18 02:18:36 Imported Files for HeloBlackList: 2,208 Dec-17-18 02:18:36 Imported Files for Bayes/HMM: 2,208 Dec-17-18 02:18:36 Finished in 164 seconds (13.46 files/s - 34.86 MByte) Dec-17-18 02:18:36 info: corpusnorm after processing messages/errors-spam and messages/errors-notspam is Spam Weight: 657272 / Not-Spam Weight: 3563832 => norm: 0.184 Dec-17-18 02:18:36 info: require approximately all files (2,061,306 words) from folder messages/spam to get the wanted corpusnorm (1.000) Dec-17-18 02:18:36 c:/ASSP/messages/spam Dec-17-18 02:18:36 File Count: 14,937 Dec-17-18 02:18:36 Processing... messages/spam with 14,937 files Dec-17-18 02:25:25 0 attachment/image entries processed Dec-17-18 02:25:25 Imported Files for HeloBlackList: 14,937 Dec-17-18 02:25:25 Imported Files for Bayes/HMM: 14,937 Dec-17-18 02:25:25 Finished in 409 seconds (36.52 files/s - 69.05 MByte) Dec-17-18 02:25:25 info: require approximately 1 files (2 words) from folder messages/notspam to get the wanted corpusnorm (1.000) Dec-17-18 02:25:25 c:/ASSP/messages/notspam Dec-17-18 02:25:25 File Count: 9,382 Dec-17-18 02:25:25 Processing... messages/notspam with 9,382 files Dec-17-18 02:26:42 0 attachment/image entries processed Dec-17-18 02:26:42 Imported Files for HeloBlackList: 9,382 Dec-17-18 02:26:42 Imported Files for Bayes/HMM: 0 Dec-17-18 02:26:42 Finished in 77 seconds (121.84 files/s - 81.79 MByte) Dec-17-18 02:26:42 Generating weighted Bayesian tuplets Dec-17-18 02:27:04 start populating Spamdb with 465,296 records - Bayesian check is now disabled! Dec-17-18 02:28:19 Finished populating Spamdb with 465,296 records - Bayesian check is now enabled! Dec-17-18 02:28:19 done - Generating weighted Bayesian tuplets Dec-17-18 02:28:19 Bayesian Pairs: 465,296 now in list Dec-17-18 02:28:19 Generating consolidated Hidden-Markov-Model database from 2,155,159 record model Dec-17-18 02:30:25 HMM sequences: 1,059,525 now in list Dec-17-18 02:30:26 generating Spamdb.helo records from 13,393 collected HELO's Dec-17-18 02:30:28 cleaning old Spamdb.helo records Dec-17-18 02:30:28 done - cleaning old Spamdb.helo records Dec-17-18 02:30:28 HELO Blacklist: 25 new, 1,159 now in list Dec-17-18 02:30:28 Spam Weight : 2,745,357 Dec-17-18 02:30:28 Not-Spam Weight: 3,563,832 Dec-17-18 02:30:28 Corpus norm: 0.7703 - (ok - slighly ham heavy) Dec-17-18 02:30:28 Corpus confidence: 0.66134618 Dec-17-18 02:30:33 Start populating Hidden Markov Model. HMM-check is disabled for this time! Dec-17-18 02:30:33 start populating Hidden Markov Model with 1,059,525 records! Dec-17-18 02:33:08 Finished populating Hidden Markov Model with 1,059,525 records! Dec-17-18 02:33:08 Finished populating Hidden Markov Model. HMM-check is now enabled again! Dec-17-18 02:33:08 Total processing time: 1,088 second(s) Dec-17-18 02:33:08 Total processing data: 195.58 MByte Dec-17-18 02:33:08 Rebuild processed 39.12 files per second. Dec-17-18 02:33:08 After finishing the Rebuild process, the c:/ASSP/tmpDB folder contains 363.74 MByte. Dec-17-18 02:33:08 After finishing the Rebuild process, the drive that contains the c:/ASSP/tmpDB folder has 12.89 GByte free space from total 25.20 GByte. Dec-17-18 02:33:08 building new GripList records and bounce report Dec-17-18 02:33:08 processing Logfile c:/ASSP/logs/maillog.txt Dec-17-18 02:33:08 processing Logfile c:/ASSP/logs/18-12-16.maillog.txt Dec-17-18 02:33:15 processing Logfile c:/ASSP/logs/18-12-15.maillog.txt Dec-17-18 02:33:20 processing Logfile c:/ASSP/logs/18-12-14.maillog.txt Dec-17-18 02:33:28 processing Logfile c:/ASSP/logs/18-12-13.maillog.txt Dec-17-18 02:33:29 processing Logfile c:/ASSP/logs/18-12-12.maillog.txt Dec-17-18 02:33:30 bounce report for the last two days: 11 bounces received (possibly delayed) - 1 bounces blocked Dec-17-18 02:33:30 list of the top ten local addresses with blocked bounces in the last two days: b...@ourcharity.org : 1 Dec-17-18 02:33:30 end of bounce report Dec-17-18 02:33:31 Uploading Griplist via Direct Connection Dec-17-18 02:33:32 Submitted 6,144 bytes: 0 IPv6 addresses, 2,654 IPv4 addresses, good IP's 811 , bad IP's 1,137 Dec-17-18 02:33:32 Trashlist was saved to c:/ASSP/trashlist.db THANKS!!_______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! *******************************************************
_______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test