Things are looking MUCH better now. Thank you.  Somehow mail older than 31
days was removed from spam/notspam.  I don't know how that setting became
that way, it was always 0 before.  I've restored from backups and manually
tweaked as suggested.  I appreciate the guidance as always.

On Tue, Dec 18, 2018 at 12:39 PM K Post <nntp.p...@gmail.com> wrote:

> I'll get more mesages into errors\spam right away and play with the
> maxbytes settings as suggested.
>
> MaxCorrectedDays was 0 (so never delete right?).  It's always been that
> way, intentionally.  I manually edit as needed, but it trims to my 15k
> max.
>
> Somehow, MaxBayesFileAge was only 31.  I am almost certain I've always had
> this as 0, then files are deleted during the rebuild process randomly to
> get a better sampling.  Is that a bad strategy now?
>
> *Is there any chance that one of the new versions erroneously overwrites
> thisMaxBayesFileAge 0 value??*  I certainly could be mistaken or maybe I
> somehow reset that to default.  That at least explains why my nospam folder
> was sub 10k.
>
> Yes, most of our spam messages are very short. Is that unusual? It's
> always been true here at least.
>
> Thanks for the help!!
>
>
> On Tue, Dec 18, 2018 at 8:38 AM Thomas Eckardt <thomas.ecka...@thockar.com>
> wrote:
>
>> you need much more spam mails
>>
>> target should be to get a corpusnorm of 0.9 .... 1.1 after "info:
>> corpusnorm after processing messages/errors-spam and
>> messages/errors-notspam
>>
>> this will require an amount of  ~ 5.000 spam mails in
>>  messages/errors-spam    ([1] moving *(e.g. older) well known spam* from
>> messages/spam to messages/errors-spam will help)
>>
>> It looks like most of your collected spam mails are very short. 16.000
>> spam and 2.200 ham resulting in a corpunorm of 0.77 -> collect at least
>> 4.000 + [1] more spam mails.
>>
>> set MaxCorrectedDays very high (e.g. 10.000) - leave this for ever
>>
>> procedure:
>>
>> increase MaxFiles (e.g. to 30.000)
>> set the first value of MaxBayesFileAge much higher until the corpusnorm
>> is balanced (this will take some days) - than calculate the age of the
>> oldest spam and set the first value of MaxBayesFileAge accordingly - count
>> the files in messages/spam and set MaxFiles accordingly
>> If the corpusnorm is fine, leave the setting for some days (be patient
>> !!!!).
>>
>> Than increase MaxBytes to 8.000. This will lead in to a too low
>> corpusnorm. Start the above procedure again.
>> Than increase MaxBytes to 20.000. This will lead again in to a too low
>> corpusnorm. Start the above procedure again.
>>
>> Every some days check the rebuild log. Small corrections for
>> MaxBayesFileAge will help to keep everyting fine. Most times no correction
>> will be required.
>> If   "info: corpusnorm after processing messages/errors-spam and
>> messages/errors-notspam..." becomes too unbalanced, correct the long time
>> corpus manually (move files)!
>>
>> Keep in mind: the rebuild task requires two runs after any of the above
>> value changes, to reach the auto-self-healthy-state!
>>
>> Thomas
>>
>>
>>
>> Von:        "K Post" <nntp.p...@gmail.com>
>> An:        "ASSP development mailing list" <
>> assp-test@lists.sourceforge.net>
>> Datum:        17.12.2018 16:05
>> Betreff:        [Assp-test] Rebuild only needs 1 file from notspam?
>> ------------------------------
>>
>>
>>
>> I just reviewed a rebuild llog and was shocked to see:
>> Dec-17-18 02:25:25 info: require approximately 1 files (2 words) from
>> folder messages/notspam to get the wanted corpusnorm (1.000)
>>
>> That's after the messages/spam folder (15k messages) is processed.
>> I have maxfiles set to 15,000
>> maxbytes set to 4,000
>>
>> Suggestions?  I certainly want our users' good mail to be considered!
>> Can't say I've seen this ever before, but I don't review the rebuild log
>> terribly often.
>>
>> Copy of rebuild log:
>>
>>
>> File rebuildrun.txt follows:
>>
>>
>> Dec-17-18 02:15:00 RebuildSpamDB-thread rebuildspamdb-version 7.50
>> started in ASSP version 2.6.2(18339)
>>
>> Dec-17-18 02:15:00 RebuildSpamDB uses BerkeleyDB for temporary hashes
>>
>> Dec-17-18 02:15:00 RebuildSpamDB uses BerkeleyDB-ENV with 62.50 MByte
>>
>> Dec-17-18 02:15:00 RebuildSpamDB will create a Hidden Markov Model
>>
>> Dec-17-18 02:15:00 RebuildSpamDB will include attachment-database-entries
>> in to spamdb
>>
>> Dec-17-18 02:15:00 RebuildSpamDB will create unicode enabled databases
>>
>> Dec-17-18 02:15:00 RebuildSpamDB will process all words as Sequence of
>> UAX #29 Grapheme Clusters
>>
>> Dec-17-18 02:15:00 RebuildSpamDB will normalize unicode characters
>>
>> Dec-17-18 02:15:00 RebuildSpamDB will use the ASSP_WordStem engine
>>
>> Dec-17-18 02:15:00 ---ASSP Settings---
>> Dec-17-18 02:15:00 Do Not Collect Messages with RedListed address:
>> Enabled **Messages with RedListed addresses will be removed from the
>> corpus!**
>>
>> Dec-17-18 02:15:00 Do Not Collect RedRe Messages: Enabled **Messages
>> matching the RedRe will be removed from the corpus!**
>>
>> Dec-17-18 02:15:00 Use Subject as Maillog Names: True
>> Dec-17-18 02:15:00 Maxbytes: 4,000
>> Dec-17-18 02:15:00 Maxfiles: 15,000
>> Dec-17-18 02:15:00 RebuildFileTimeLimit: 1 5
>> Dec-17-18 02:15:00 RebuildFileTimeLimit: files will be moved away from
>> the corpus if their processing takes longer than 5 second(s)
>>
>> Dec-17-18 02:15:00 Trashlist cleaning finished, 2 of 56 files deleted
>>
>> Dec-17-18 02:15:00 c:/ASSP/messages/errors-spam
>> Dec-17-18 02:15:00 File Count: 934
>> Dec-17-18 02:15:00 Processing... messages/errors-spam with 934 files
>> Dec-17-18 02:15:52 0 attachment/image entries processed
>> Dec-17-18 02:15:52 Imported Files for HeloBlackList: 933
>> Dec-17-18 02:15:52 Imported Files for Bayes/HMM: 933
>> Dec-17-18 02:15:52 Finished in 52 seconds (17.94 files/s - 9.88 MByte)
>>
>> Dec-17-18 02:15:52 c:/ASSP/messages/errors-notspam
>> Dec-17-18 02:15:52 File Count: 2,209
>> Dec-17-18 02:15:52 Processing... messages/errors-notspam with 2,209 files
>> Dec-17-18 02:18:36 0 attachment/image entries processed
>> Dec-17-18 02:18:36 Imported Files for HeloBlackList: 2,208
>> Dec-17-18 02:18:36 Imported Files for Bayes/HMM: 2,208
>> Dec-17-18 02:18:36 Finished in 164 seconds (13.46 files/s - 34.86 MByte)
>> Dec-17-18 02:18:36 info: corpusnorm after processing messages/errors-spam
>> and messages/errors-notspam is Spam Weight: 657272 / Not-Spam Weight:
>> 3563832 => norm: 0.184
>> Dec-17-18 02:18:36 info: require approximately all files (2,061,306
>> words) from folder messages/spam to get the wanted corpusnorm (1.000)
>>
>> Dec-17-18 02:18:36 c:/ASSP/messages/spam
>> Dec-17-18 02:18:36 File Count: 14,937
>> Dec-17-18 02:18:36 Processing... messages/spam with 14,937 files
>> Dec-17-18 02:25:25 0 attachment/image entries processed
>> Dec-17-18 02:25:25 Imported Files for HeloBlackList: 14,937
>> Dec-17-18 02:25:25 Imported Files for Bayes/HMM: 14,937
>> Dec-17-18 02:25:25 Finished in 409 seconds (36.52 files/s - 69.05 MByte)
>> Dec-17-18 02:25:25 info: require approximately 1 files (2 words) from
>> folder messages/notspam to get the wanted corpusnorm (1.000)
>>
>> Dec-17-18 02:25:25 c:/ASSP/messages/notspam
>> Dec-17-18 02:25:25 File Count: 9,382
>> Dec-17-18 02:25:25 Processing... messages/notspam with 9,382 files
>> Dec-17-18 02:26:42 0 attachment/image entries processed
>> Dec-17-18 02:26:42 Imported Files for HeloBlackList: 9,382
>> Dec-17-18 02:26:42 Imported Files for Bayes/HMM: 0
>> Dec-17-18 02:26:42 Finished in 77 seconds (121.84 files/s - 81.79 MByte)
>>
>> Dec-17-18 02:26:42 Generating weighted Bayesian tuplets
>> Dec-17-18 02:27:04 start populating Spamdb with 465,296 records -
>> Bayesian check is now disabled!
>> Dec-17-18 02:28:19 Finished populating Spamdb with 465,296 records -
>> Bayesian check is now enabled!
>> Dec-17-18 02:28:19 done - Generating weighted Bayesian tuplets
>>
>> Dec-17-18 02:28:19 Bayesian Pairs: 465,296 now in list
>>
>> Dec-17-18 02:28:19 Generating consolidated Hidden-Markov-Model database
>> from 2,155,159 record model
>> Dec-17-18 02:30:25 HMM sequences: 1,059,525 now in list
>>
>> Dec-17-18 02:30:26 generating Spamdb.helo records from 13,393 collected
>> HELO's
>> Dec-17-18 02:30:28 cleaning old Spamdb.helo records
>> Dec-17-18 02:30:28 done - cleaning old Spamdb.helo records
>>
>> Dec-17-18 02:30:28 HELO Blacklist: 25 new, 1,159 now in list
>>
>> Dec-17-18 02:30:28 Spam Weight    :   2,745,357
>> Dec-17-18 02:30:28 Not-Spam Weight:   3,563,832
>>
>> Dec-17-18 02:30:28 Corpus norm: 0.7703 - (ok - slighly ham heavy)
>> Dec-17-18 02:30:28 Corpus confidence: 0.66134618
>>
>> Dec-17-18 02:30:33 Start populating Hidden Markov Model. HMM-check is
>> disabled for this time!
>> Dec-17-18 02:30:33 start populating Hidden Markov Model with 1,059,525
>> records!
>> Dec-17-18 02:33:08 Finished populating Hidden Markov Model with 1,059,525
>> records!
>> Dec-17-18 02:33:08 Finished populating Hidden Markov Model. HMM-check is
>> now enabled again!
>>
>> Dec-17-18 02:33:08 Total processing time: 1,088 second(s)
>>
>> Dec-17-18 02:33:08 Total processing data: 195.58 MByte
>>
>>
>> Dec-17-18 02:33:08 Rebuild processed 39.12 files per second.
>>
>> Dec-17-18 02:33:08 After finishing the Rebuild process, the c:/ASSP/tmpDB
>> folder contains 363.74 MByte.
>>
>> Dec-17-18 02:33:08 After finishing the Rebuild process, the drive that
>> contains the c:/ASSP/tmpDB folder has 12.89 GByte free space from total
>> 25.20 GByte.
>>
>> Dec-17-18 02:33:08 building new GripList records and bounce report
>> Dec-17-18 02:33:08 processing Logfile c:/ASSP/logs/maillog.txt
>> Dec-17-18 02:33:08 processing Logfile c:/ASSP/logs/18-12-16.maillog.txt
>> Dec-17-18 02:33:15 processing Logfile c:/ASSP/logs/18-12-15.maillog.txt
>> Dec-17-18 02:33:20 processing Logfile c:/ASSP/logs/18-12-14.maillog.txt
>> Dec-17-18 02:33:28 processing Logfile c:/ASSP/logs/18-12-13.maillog.txt
>> Dec-17-18 02:33:29 processing Logfile c:/ASSP/logs/18-12-12.maillog.txt
>>
>> Dec-17-18 02:33:30 bounce report for the last two days: 11 bounces
>> received (possibly delayed) - 1 bounces blocked
>>
>> Dec-17-18 02:33:30 list of the top ten local addresses with blocked
>> bounces in the last two days:
>>
>>  b...@ourcharity.org : 1
>>
>> Dec-17-18 02:33:30 end of bounce report
>>
>> Dec-17-18 02:33:31 Uploading Griplist via Direct Connection
>> Dec-17-18 02:33:32 Submitted 6,144 bytes: 0 IPv6 addresses, 2,654 IPv4
>> addresses, good IP's 811 , bad IP's 1,137
>>
>> Dec-17-18 02:33:32 Trashlist was saved to c:/ASSP/trashlist.db
>>
>>
>> THANKS!!_______________________________________________
>> Assp-test mailing list
>> Assp-test@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/assp-test
>>
>>
>>
>>
>> DISCLAIMER:
>> *******************************************************
>> This email and any files transmitted with it may be confidential, legally
>> privileged and protected in law and are intended solely for the use of the
>> individual to whom it is addressed.
>> This email was multiple times scanned for viruses. There should be no
>> known virus in this email!
>> *******************************************************
>>
>> _______________________________________________
>> Assp-test mailing list
>> Assp-test@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/assp-test
>>
>
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to