Apologies, I had the right bits copy/pasted into a notepad window but for some reason ended up with only half of it in the message. Full log below minus any lines containing filenames with subjects in them. Both my spam and notspam folders currently have just short of 15,000 messages in them.
2012-09-11 22:00:00 RebuildSpamDB-thread rebuildspamdb-version 6.01 started in ASSP version 2.2.2(12255) 2012-09-11 22:00:00 RebuildSpamDB will create a Hidden Markov Model! 2012-09-11 22:00:00 RebuildSpamDB will create unicode enabled databases. 2012-09-11 22:00:00 RebuildSpamDB process all words as Sequence of UAX #29 Grapheme Clusters. 2012-09-11 22:00:00 RebuildSpamDB will use the ASSP_WordStem engine. 2012-09-11 22:00:00 ---ASSP Settings--- 2012-09-11 22:00:00 Do Not Collect Messages with RedListed address: Enabled **Messages with RedListed addresses will be removed from the corpus!** 2012-09-11 22:00:00 Do Not Collect RedRe Messages: Enabled **Messages matching the RedRe will be removed from the corpus!** 2012-09-11 22:00:00 Use Subject as Maillog Names: True 2012-09-11 22:00:00 Maxbytes: 4000 2012-09-11 22:00:00 RebuildFileTimeLimit: 1 5 2012-09-11 22:00:00 RebuildFileTimeLimit: files will be moved away from the corpus, if there processing takes longer than 5 second(s) 2012-09-11 22:00:00 Trashlist cleaning finished, 30 of 106 files deleted 2012-09-11 22:00:00 /usr/local/assp/store/errors/spam 2012-09-11 22:00:00 File Count: 1,422 2012-09-11 22:00:00 Processing... store/errors/spam with 1,422 files 2012-09-11 22:00:00 ignore and remove files older than 2009-12-16 21:00:00 in folder store/errors/spam 2012-09-11 22:02:22 Imported Files: 1,420 2012-09-11 22:02:22 Finished in 142 second(s) 2012-09-11 22:02:22 /usr/local/assp/store/errors/notspam 2012-09-11 22:02:22 File Count: 1,392 2012-09-11 22:02:22 Processing... store/errors/notspam with 1,392 files 2012-09-11 22:02:22 ignore and remove files older than 2009-12-16 21:02:22 in folder store/errors/notspam 2012-09-11 22:04:43 Imported Files: 1,390 2012-09-11 22:04:43 Finished in 141 second(s) 2012-09-11 22:04:43 info: corpusnorm after processing store/errors/spam and store/errors/notspam is spamwords 996168/ hamwords 1886016 => 0.528186399267026 2012-09-11 22:04:43 info: require 15163 files from folder store/spam to get a fine corpusnorm (1) 2012-09-11 22:04:43 /usr/local/assp/store/spam 2012-09-11 22:04:43 File Count: 15,163 2012-09-11 22:04:43 Processing... store/spam with 14,000 files 2012-09-11 22:35:28 Removed White: 2 2012-09-11 22:35:28 Imported Files: 14,001 2012-09-11 22:35:28 Folder contents exceeded 'MaxFiles'(14000). 2012-09-11 22:35:28 Finished in 1,845 second(s) 2012-09-11 22:35:28 info: require 6506 files from folder store/notspam to get a fine corpusnorm (1) 2012-09-11 22:35:28 /usr/local/assp/store/notspam 2012-09-11 22:35:28 File Count: 17,475 2012-09-11 22:35:28 Processing... store/notspam with 6,506 files 2012-09-11 22:48:45 Imported Files: 6,507 2012-09-11 22:48:45 Folder contents exceeded 'MaxFiles'(14000). 2012-09-11 22:48:45 Finished in 797 second(s) 2012-09-11 22:48:45 Rebuild processed 7.97 files per second. Good values are 12 files per second and higher. You can speed up the rebuild process, using a cached (>=128MB) IO-controller or a RAM-disk with at least 1.04 GBbyte for the folder '/usr/local/assp/tmpDB'. 2012-09-11 22:48:45 Generating weighted Bayesian tuplets 2012-09-11 22:49:22 start populating Spamdb with 360,515 records - Bayesian check is now disabled! 2012-09-11 22:51:16 Finished populating Spamdb with 360,515 records - Bayesian check is now enabled! 2012-09-11 22:51:16 done - Generating weighted Bayesian tuplets 2012-09-11 22:51:16 Bayesian Pairs: 360,515 now in list 2012-09-11 22:51:18 Generating consolidated Hidden-Markov-Model database from 5,252,212 record model 2012-09-11 22:56:39 HMM sequences: 2,553,143 now in list 2012-09-11 22:56:39 generating Spamdb.helo records from 7,212 collected HELO's 2012-09-11 22:56:59 cleaning old Spamdb.helo records 2012-09-11 22:57:01 done - cleaning old Spamdb.helo records 2012-09-11 22:57:01 HELO Blacklist: 52 new, 547 now in list 2012-09-11 22:57:01 Spam Weight: 7,281,890 2012-09-11 22:57:01 Not-Spam Weight: 4,460,847 2012-09-11 22:57:01 Corpus norm: 1.6324 - (warning: extremely spam heavy) 2012-09-11 22:57:01 Corpus confidence: 0.14082925 2012-09-11 22:57:01 Recommendation: RebuildSpamDB will limit the number of used messages in your corpus. Excess files will be ingored. 2012-09-11 22:57:01 Corpus norm should be between 0.6 and 1.4 2012-09-11 22:57:01 Recommendation: You need more not-spam messages in the corpus. 2012-09-11 22:57:01 starting auto correction for corpus - delete old spam files from store/spam 2012-09-11 22:57:02 info: starting cleanup for to much (old) files in folder /usr/local/assp/store/spam - will try to remove 40% of the files - will keep at least 4000 files - will keep files younger than 14 days 2012-09-11 22:57:02 Recommendation: You should increase now MaxBytes to 6000! 2012-09-11 22:57:07 Start populating Hidden Markov Model. HMM-check is disabled for this time! 2012-09-11 22:57:07 start populating Hidden Markov Model with 2,553,143 records! 2012-09-11 23:56:24 Finished populating Hidden Markov Model with 2,553,143 records! 2012-09-11 23:56:24 Finished populating Hidden Markov Model. HMM-check is now enabled again! 2012-09-11 23:56:24 Total processing time: 6,984 second(s) 2012-09-11 23:56:24 Total processing data: 164.11 MByte 2012-09-11 23:56:24 building new GripList records and bounce report 2012-09-11 23:56:24 processing Logfile /usr/local/assp/maillog.txt 2012-09-11 23:56:33 processing Logfile /usr/local/assp/12-09-10.maillog.txt 2012-09-11 23:56:45 processing Logfile /usr/local/assp/12-09-09.maillog.txt 2012-09-11 23:56:47 processing Logfile /usr/local/assp/12-09-08.maillog.txt 2012-09-11 23:56:52 processing Logfile /usr/local/assp/12-09-07.maillog.txt 2012-09-11 23:56:56 skipping bounce report because 'DoNotCollectBounces' is switched ON 2012-09-11 23:56:56 Uploading Griplist via Direct Connection 2012-09-11 23:56:57 Submitted 3,369 bytes: 0 IPv6 addresses, 373 IPv4 addresses 2012-09-11 23:56:57 Trashlist was saved to /usr/local/assp/trashlist.db On 12/09/2012 09:39, Thomas Eckardt wrote: > Same to you Colin, > > to verify how the new code is working I need at least all the output about > all the folders and the the resulting corpusnorm. > > Thomas > > > > > Von: Colin <a...@lanternhosting.co.uk> > An: assp-test@lists.sourceforge.net, > Datum: 12.09.2012 10:28 > Betreff: Re: [Assp-test] New version > > > > I am seeing the same although not quite as bad. It seems to be related > to the notspam folder and less than half of the files in it being > processed. > > Yesterday I had: > > 2012-09-10 23:15:39 Corpus norm: 0.9749 - (very good - > balanced) > 2012-09-10 23:15:39 Corpus confidence: 1.00000000 > > Now I have: > > 2012-09-11 22:04:43 /usr/local/assp/store/spam > 2012-09-11 22:04:43 File Count: 15,163 > 2012-09-11 22:04:43 Processing... store/spam with 14,000 files > 2012-09-11 22:35:28 Imported Files: 14,001 > 2012-09-11 22:35:28 Folder contents exceeded 'MaxFiles'(14000). > 2012-09-11 22:35:28 Finished in 1,845 second(s) > 2012-09-11 22:35:28 info: require 6506 files from folder store/notspam to > get a fine corpusnorm (1) > > 2012-09-11 22:35:28 /usr/local/assp/store/notspam > 2012-09-11 22:35:28 File Count: 17,475 > 2012-09-11 22:35:28 Processing... store/notspam with 6,506 files > 2012-09-11 22:47:45 remove/usr/local/assp/store/notspam/--522681.eml > corrected spam > 2012-09-11 22:48:45 Imported Files: 6,507 > 2012-09-11 22:48:45 Folder contents exceeded 'MaxFiles'(14000). > 2012-09-11 22:48:45 Finished in 797 second(s) > > All the best, > Colin Waring. > > On 11/09/2012 20:49, Steve Moffat wrote: >> Hi >> I updated to the new release today and rebuildspamdb has ruined my > corpus confidence. Not too happy with that.... >> Sep-11-12 16:26:55 Spam Weight: 3,904,196 >> Sep-11-12 16:26:55 Not-Spam Weight: 1,950,092 >> >> Sep-11-12 16:26:55 Corpus norm: 2.0021 - (warning: extremely > spam heavy) >> Sep-11-12 16:26:55 Corpus confidence: 0.06224349 >> Sep-11-12 16:26:55 Recommendation: RebuildSpamDB will limit the number > of used messages in your corpus. Excess files will be ingored. >> Sep-11-12 16:26:55 Corpus norm should be between 0.6 and 1.4 >> >> Thanks >> Steve >> Steve Moffat >> Operations Director >> Optimum IT Solutions >> Desk: 441 292 8849 >> Mobile: 441 292 8849 >> MSN IM:st...@optimum.bm<mailto:st...@optimum.bm> >> Web:http://www.optimum.bm<http://www.optimum.bm/> >> >> > ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. > Discussions >> will include endpoint security, mobile security and the latest in > malware >> threats.http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Assp-test mailing list >> Assp-test@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/assp-test > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > > > > > DISCLAIMER: > ******************************************************* > This email and any files transmitted with it may be confidential, legally > privileged and protected in law and are intended solely for the use of the > > individual to whom it is addressed. > This email was multiple times scanned for viruses. There should be no > known virus in this email! > ******************************************************* > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test