Things are looking MUCH better now. Thank you. Somehow mail older than 31 days was removed from spam/notspam. I don't know how that setting became that way, it was always 0 before. I've restored from backups and manually tweaked as suggested. I appreciate the guidance as always.
On Tue, Dec 18, 2018 at 12:39 PM K Post <nntp.p...@gmail.com> wrote: > I'll get more mesages into errors\spam right away and play with the > maxbytes settings as suggested. > > MaxCorrectedDays was 0 (so never delete right?). It's always been that > way, intentionally. I manually edit as needed, but it trims to my 15k > max. > > Somehow, MaxBayesFileAge was only 31. I am almost certain I've always had > this as 0, then files are deleted during the rebuild process randomly to > get a better sampling. Is that a bad strategy now? > > *Is there any chance that one of the new versions erroneously overwrites > thisMaxBayesFileAge 0 value??* I certainly could be mistaken or maybe I > somehow reset that to default. That at least explains why my nospam folder > was sub 10k. > > Yes, most of our spam messages are very short. Is that unusual? It's > always been true here at least. > > Thanks for the help!! > > > On Tue, Dec 18, 2018 at 8:38 AM Thomas Eckardt <thomas.ecka...@thockar.com> > wrote: > >> you need much more spam mails >> >> target should be to get a corpusnorm of 0.9 .... 1.1 after "info: >> corpusnorm after processing messages/errors-spam and >> messages/errors-notspam >> >> this will require an amount of ~ 5.000 spam mails in >> messages/errors-spam ([1] moving *(e.g. older) well known spam* from >> messages/spam to messages/errors-spam will help) >> >> It looks like most of your collected spam mails are very short. 16.000 >> spam and 2.200 ham resulting in a corpunorm of 0.77 -> collect at least >> 4.000 + [1] more spam mails. >> >> set MaxCorrectedDays very high (e.g. 10.000) - leave this for ever >> >> procedure: >> >> increase MaxFiles (e.g. to 30.000) >> set the first value of MaxBayesFileAge much higher until the corpusnorm >> is balanced (this will take some days) - than calculate the age of the >> oldest spam and set the first value of MaxBayesFileAge accordingly - count >> the files in messages/spam and set MaxFiles accordingly >> If the corpusnorm is fine, leave the setting for some days (be patient >> !!!!). >> >> Than increase MaxBytes to 8.000. This will lead in to a too low >> corpusnorm. Start the above procedure again. >> Than increase MaxBytes to 20.000. This will lead again in to a too low >> corpusnorm. Start the above procedure again. >> >> Every some days check the rebuild log. Small corrections for >> MaxBayesFileAge will help to keep everyting fine. Most times no correction >> will be required. >> If "info: corpusnorm after processing messages/errors-spam and >> messages/errors-notspam..." becomes too unbalanced, correct the long time >> corpus manually (move files)! >> >> Keep in mind: the rebuild task requires two runs after any of the above >> value changes, to reach the auto-self-healthy-state! >> >> Thomas >> >> >> >> Von: "K Post" <nntp.p...@gmail.com> >> An: "ASSP development mailing list" < >> assp-test@lists.sourceforge.net> >> Datum: 17.12.2018 16:05 >> Betreff: [Assp-test] Rebuild only needs 1 file from notspam? >> ------------------------------ >> >> >> >> I just reviewed a rebuild llog and was shocked to see: >> Dec-17-18 02:25:25 info: require approximately 1 files (2 words) from >> folder messages/notspam to get the wanted corpusnorm (1.000) >> >> That's after the messages/spam folder (15k messages) is processed. >> I have maxfiles set to 15,000 >> maxbytes set to 4,000 >> >> Suggestions? I certainly want our users' good mail to be considered! >> Can't say I've seen this ever before, but I don't review the rebuild log >> terribly often. >> >> Copy of rebuild log: >> >> >> File rebuildrun.txt follows: >> >> >> Dec-17-18 02:15:00 RebuildSpamDB-thread rebuildspamdb-version 7.50 >> started in ASSP version 2.6.2(18339) >> >> Dec-17-18 02:15:00 RebuildSpamDB uses BerkeleyDB for temporary hashes >> >> Dec-17-18 02:15:00 RebuildSpamDB uses BerkeleyDB-ENV with 62.50 MByte >> >> Dec-17-18 02:15:00 RebuildSpamDB will create a Hidden Markov Model >> >> Dec-17-18 02:15:00 RebuildSpamDB will include attachment-database-entries >> in to spamdb >> >> Dec-17-18 02:15:00 RebuildSpamDB will create unicode enabled databases >> >> Dec-17-18 02:15:00 RebuildSpamDB will process all words as Sequence of >> UAX #29 Grapheme Clusters >> >> Dec-17-18 02:15:00 RebuildSpamDB will normalize unicode characters >> >> Dec-17-18 02:15:00 RebuildSpamDB will use the ASSP_WordStem engine >> >> Dec-17-18 02:15:00 ---ASSP Settings--- >> Dec-17-18 02:15:00 Do Not Collect Messages with RedListed address: >> Enabled **Messages with RedListed addresses will be removed from the >> corpus!** >> >> Dec-17-18 02:15:00 Do Not Collect RedRe Messages: Enabled **Messages >> matching the RedRe will be removed from the corpus!** >> >> Dec-17-18 02:15:00 Use Subject as Maillog Names: True >> Dec-17-18 02:15:00 Maxbytes: 4,000 >> Dec-17-18 02:15:00 Maxfiles: 15,000 >> Dec-17-18 02:15:00 RebuildFileTimeLimit: 1 5 >> Dec-17-18 02:15:00 RebuildFileTimeLimit: files will be moved away from >> the corpus if their processing takes longer than 5 second(s) >> >> Dec-17-18 02:15:00 Trashlist cleaning finished, 2 of 56 files deleted >> >> Dec-17-18 02:15:00 c:/ASSP/messages/errors-spam >> Dec-17-18 02:15:00 File Count: 934 >> Dec-17-18 02:15:00 Processing... messages/errors-spam with 934 files >> Dec-17-18 02:15:52 0 attachment/image entries processed >> Dec-17-18 02:15:52 Imported Files for HeloBlackList: 933 >> Dec-17-18 02:15:52 Imported Files for Bayes/HMM: 933 >> Dec-17-18 02:15:52 Finished in 52 seconds (17.94 files/s - 9.88 MByte) >> >> Dec-17-18 02:15:52 c:/ASSP/messages/errors-notspam >> Dec-17-18 02:15:52 File Count: 2,209 >> Dec-17-18 02:15:52 Processing... messages/errors-notspam with 2,209 files >> Dec-17-18 02:18:36 0 attachment/image entries processed >> Dec-17-18 02:18:36 Imported Files for HeloBlackList: 2,208 >> Dec-17-18 02:18:36 Imported Files for Bayes/HMM: 2,208 >> Dec-17-18 02:18:36 Finished in 164 seconds (13.46 files/s - 34.86 MByte) >> Dec-17-18 02:18:36 info: corpusnorm after processing messages/errors-spam >> and messages/errors-notspam is Spam Weight: 657272 / Not-Spam Weight: >> 3563832 => norm: 0.184 >> Dec-17-18 02:18:36 info: require approximately all files (2,061,306 >> words) from folder messages/spam to get the wanted corpusnorm (1.000) >> >> Dec-17-18 02:18:36 c:/ASSP/messages/spam >> Dec-17-18 02:18:36 File Count: 14,937 >> Dec-17-18 02:18:36 Processing... messages/spam with 14,937 files >> Dec-17-18 02:25:25 0 attachment/image entries processed >> Dec-17-18 02:25:25 Imported Files for HeloBlackList: 14,937 >> Dec-17-18 02:25:25 Imported Files for Bayes/HMM: 14,937 >> Dec-17-18 02:25:25 Finished in 409 seconds (36.52 files/s - 69.05 MByte) >> Dec-17-18 02:25:25 info: require approximately 1 files (2 words) from >> folder messages/notspam to get the wanted corpusnorm (1.000) >> >> Dec-17-18 02:25:25 c:/ASSP/messages/notspam >> Dec-17-18 02:25:25 File Count: 9,382 >> Dec-17-18 02:25:25 Processing... messages/notspam with 9,382 files >> Dec-17-18 02:26:42 0 attachment/image entries processed >> Dec-17-18 02:26:42 Imported Files for HeloBlackList: 9,382 >> Dec-17-18 02:26:42 Imported Files for Bayes/HMM: 0 >> Dec-17-18 02:26:42 Finished in 77 seconds (121.84 files/s - 81.79 MByte) >> >> Dec-17-18 02:26:42 Generating weighted Bayesian tuplets >> Dec-17-18 02:27:04 start populating Spamdb with 465,296 records - >> Bayesian check is now disabled! >> Dec-17-18 02:28:19 Finished populating Spamdb with 465,296 records - >> Bayesian check is now enabled! >> Dec-17-18 02:28:19 done - Generating weighted Bayesian tuplets >> >> Dec-17-18 02:28:19 Bayesian Pairs: 465,296 now in list >> >> Dec-17-18 02:28:19 Generating consolidated Hidden-Markov-Model database >> from 2,155,159 record model >> Dec-17-18 02:30:25 HMM sequences: 1,059,525 now in list >> >> Dec-17-18 02:30:26 generating Spamdb.helo records from 13,393 collected >> HELO's >> Dec-17-18 02:30:28 cleaning old Spamdb.helo records >> Dec-17-18 02:30:28 done - cleaning old Spamdb.helo records >> >> Dec-17-18 02:30:28 HELO Blacklist: 25 new, 1,159 now in list >> >> Dec-17-18 02:30:28 Spam Weight : 2,745,357 >> Dec-17-18 02:30:28 Not-Spam Weight: 3,563,832 >> >> Dec-17-18 02:30:28 Corpus norm: 0.7703 - (ok - slighly ham heavy) >> Dec-17-18 02:30:28 Corpus confidence: 0.66134618 >> >> Dec-17-18 02:30:33 Start populating Hidden Markov Model. HMM-check is >> disabled for this time! >> Dec-17-18 02:30:33 start populating Hidden Markov Model with 1,059,525 >> records! >> Dec-17-18 02:33:08 Finished populating Hidden Markov Model with 1,059,525 >> records! >> Dec-17-18 02:33:08 Finished populating Hidden Markov Model. HMM-check is >> now enabled again! >> >> Dec-17-18 02:33:08 Total processing time: 1,088 second(s) >> >> Dec-17-18 02:33:08 Total processing data: 195.58 MByte >> >> >> Dec-17-18 02:33:08 Rebuild processed 39.12 files per second. >> >> Dec-17-18 02:33:08 After finishing the Rebuild process, the c:/ASSP/tmpDB >> folder contains 363.74 MByte. >> >> Dec-17-18 02:33:08 After finishing the Rebuild process, the drive that >> contains the c:/ASSP/tmpDB folder has 12.89 GByte free space from total >> 25.20 GByte. >> >> Dec-17-18 02:33:08 building new GripList records and bounce report >> Dec-17-18 02:33:08 processing Logfile c:/ASSP/logs/maillog.txt >> Dec-17-18 02:33:08 processing Logfile c:/ASSP/logs/18-12-16.maillog.txt >> Dec-17-18 02:33:15 processing Logfile c:/ASSP/logs/18-12-15.maillog.txt >> Dec-17-18 02:33:20 processing Logfile c:/ASSP/logs/18-12-14.maillog.txt >> Dec-17-18 02:33:28 processing Logfile c:/ASSP/logs/18-12-13.maillog.txt >> Dec-17-18 02:33:29 processing Logfile c:/ASSP/logs/18-12-12.maillog.txt >> >> Dec-17-18 02:33:30 bounce report for the last two days: 11 bounces >> received (possibly delayed) - 1 bounces blocked >> >> Dec-17-18 02:33:30 list of the top ten local addresses with blocked >> bounces in the last two days: >> >> b...@ourcharity.org : 1 >> >> Dec-17-18 02:33:30 end of bounce report >> >> Dec-17-18 02:33:31 Uploading Griplist via Direct Connection >> Dec-17-18 02:33:32 Submitted 6,144 bytes: 0 IPv6 addresses, 2,654 IPv4 >> addresses, good IP's 811 , bad IP's 1,137 >> >> Dec-17-18 02:33:32 Trashlist was saved to c:/ASSP/trashlist.db >> >> >> THANKS!!_______________________________________________ >> Assp-test mailing list >> Assp-test@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/assp-test >> >> >> >> >> DISCLAIMER: >> ******************************************************* >> This email and any files transmitted with it may be confidential, legally >> privileged and protected in law and are intended solely for the use of the >> individual to whom it is addressed. >> This email was multiple times scanned for viruses. There should be no >> known virus in this email! >> ******************************************************* >> >> _______________________________________________ >> Assp-test mailing list >> Assp-test@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/assp-test >> >
_______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test