Hi, Just ran rebuildspamdb with the new release. The results are even worse.....before this I had a perfect corpus....
Steve -----Original Message----- From: assp@assp.local [mailto:assp@assp.local] Sent: Wednesday, September 12, 2012 1:50 PM To: Steve Moffat Subject: RebuildSpamDB - report from assp.isp.bm File rebuildrun.txt follows: Sep-12-12 13:18:25 RebuildSpamDB-thread rebuildspamdb-version 6.02 started in ASSP version 2.2.2(12256) Sep-12-12 13:18:25 RebuildSpamDB will create a Hidden Markov Model! Sep-12-12 13:18:25 RebuildSpamDB will include attachment-database-entries in to spamdb! Sep-12-12 13:18:25 RebuildSpamDB will create unicode enabled databases. Sep-12-12 13:18:25 RebuildSpamDB process all words as Sequence of UAX #29 Grapheme Clusters. Sep-12-12 13:18:25 RebuildSpamDB will use the ASSP_WordStem engine. Sep-12-12 13:18:25 ---ASSP Settings--- Sep-12-12 13:18:25 Do Not Collect RedRe Messages: Enabled **Messages matching the RedRe will be removed from the corpus!** Sep-12-12 13:18:25 Use Subject as Maillog Names: True Sep-12-12 13:18:25 Maxbytes: 4000 Sep-12-12 13:18:25 RebuildFileTimeLimit: 1 5 Sep-12-12 13:18:25 RebuildFileTimeLimit: files will be moved away from the corpus, if there processing takes longer than 5 second(s) Sep-12-12 13:18:25 C:/assp/errors/spam Sep-12-12 13:18:25 File Count: 319 Sep-12-12 13:18:25 Processing... errors/spam with 319 files Sep-12-12 13:18:25 ignore and remove files older than Dec-17-09 12:18:25 in folder errors/spam Sep-12-12 13:18:33 1 attachment/image entries processed Sep-12-12 13:18:33 Imported Files: 317 Sep-12-12 13:18:33 Finished in 8 second(s) Sep-12-12 13:18:33 C:/assp/errors/notspam Sep-12-12 13:18:33 File Count: 113 Sep-12-12 13:18:33 Processing... errors/notspam with 113 files Sep-12-12 13:18:33 ignore and remove files older than Dec-17-09 12:18:33 in folder errors/notspam Sep-12-12 13:18:40 26 attachment/image entries processed Sep-12-12 13:18:40 Imported Files: 111 Sep-12-12 13:18:40 Finished in 7 second(s) Sep-12-12 13:18:40 warning: missing information for automatic corpus correction in file C:/assp/normfile - rerun the rebuild, if you see this warning the first time! Sep-12-12 13:18:40 C:/assp/spam Sep-12-12 13:18:40 File Count: 4,363 Sep-12-12 13:18:40 Processing... spam with 4,363 files Sep-12-12 13:19:27 remove C:/assp/spam/Confirmation_of_changes_to_Boo--140013.eml WhiteList: 'ba.custs...@contact.britishairways.com' Sep-12-12 13:19:27 remove C:/assp/spam/Confirmation_of_changes_to_Boo--144011.eml WhiteList: 'ba.custs...@contact.britishairways.com' Sep-12-12 13:19:27 remove C:/assp/spam/Confirmation_of_changes_to_Boo--145936.eml WhiteList: 'ba.custs...@contact.britishairways.com' Sep-12-12 13:19:27 remove C:/assp/spam/Confirmation_of_changes_to_Boo--172792.eml WhiteList: 'ba.custs...@contact.britishairways.com' Sep-12-12 13:20:07 remove C:/assp/spam/FW_Time_Clarification_Walk_the--81794.eml WhiteList: 'busbysu...@hotmail.com' Sep-12-12 13:22:50 Removed White: 5 Sep-12-12 13:22:50 481 attachment/image entries processed Sep-12-12 13:22:50 Imported Files: 4,356 Sep-12-12 13:22:50 Finished in 250 second(s) Sep-12-12 13:22:50 C:/assp/notspam Sep-12-12 13:22:50 File Count: 12,640 Sep-12-12 13:22:50 Processing... notspam with 12,000 files Sep-12-12 13:42:28 2,022 attachment/image entries processed Sep-12-12 13:42:28 Imported Files: 12,001 Sep-12-12 13:42:28 Folder contents exceeded 'MaxFiles'(12000). Sep-12-12 13:42:28 Finished in 1,178 second(s) Sep-12-12 13:42:28 Rebuild processed 11.63 files per second. Sep-12-12 13:42:28 Generating weighted Bayesian tuplets Sep-12-12 13:42:38 start populating Spamdb with 175,796 records - Bayesian check is now disabled! Sep-12-12 13:43:45 Finished populating Spamdb with 175,796 records - Bayesian check is now enabled! Sep-12-12 13:43:45 done - Generating weighted Bayesian tuplets Sep-12-12 13:43:45 Bayesian Pairs: 175,796 now in list Sep-12-12 13:43:45 Generating consolidated Hidden-Markov-Model database from 1,634,405 record model Sep-12-12 13:45:16 HMM sequences: 800,876 now in list Sep-12-12 13:45:16 generating Spamdb.helo records from 3,664 collected HELO's Sep-12-12 13:45:16 cleaning old Spamdb.helo records Sep-12-12 13:45:17 done - cleaning old Spamdb.helo records Sep-12-12 13:45:17 HELO Blacklist: 3 new, 94 now in list Sep-12-12 13:45:17 Spam Weight: 1,598,969 Sep-12-12 13:45:17 Not-Spam Weight: 4,554,517 Sep-12-12 13:45:17 Corpus norm: 0.3511 - (warning: extremely ham heavy) Sep-12-12 13:45:17 Corpus confidence: 0.13526783 Sep-12-12 13:45:17 Recommendation: RebuildSpamDB will limit the number of used messages in your corpus. Excess files will be ingored. Sep-12-12 13:45:17 Corpus norm should be between 0.6 and 1.4 Sep-12-12 13:45:17 Recommendation: You need more spam messages in the corpus. Sep-12-12 13:45:17 starting auto correction for corpus - delete old ham files from notspam Sep-12-12 13:45:22 info: starting cleanup for to much (old) files in folder C:/assp/notspam - will try to remove 40% of the files - will keep at least 4000 files - will keep files younger than 14 days info: deleted 1646 old files from folder C:/assp/notspam Sep-12-12 13:45:22 Recommendation: You should reduce now MaxBytes to 2500! Sep-12-12 13:45:27 Start populating Hidden Markov Model. HMM-check is disabled for this time! Sep-12-12 13:45:28 start populating Hidden Markov Model with 800,876 records! Sep-12-12 13:49:06 Finished populating Hidden Markov Model with 800,876 records! Sep-12-12 13:49:06 Finished populating Hidden Markov Model. HMM-check is now enabled again! Sep-12-12 13:49:06 Total processing time: 1,841 second(s) Sep-12-12 13:49:06 Total processing data: 567.41 MByte Sep-12-12 13:49:06 building new GripList records and bounce report Sep-12-12 13:49:06 processing Logfile C:/assp/logs/maillog.txt Sep-12-12 13:49:11 skipping bounce report because 'DoNotCollectBounces' is switched ON Sep-12-12 13:49:12 Uploading Griplist via Direct Connection Sep-12-12 13:49:13 Submitted 2,910 bytes: 0 IPv6 addresses, 322 IPv4 addresses Sep-12-12 13:49:13 Trashlist was saved to C:/assp/trashlist.db ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test