Thankfully it's only a test installation. Won't do that again!!! It's always my goal to stay off of your stupidest list, but I appreciate your candor when you give me the "honor."
The subsequent rebuild gave a perfect 1.00 On Thu, Apr 30, 2015 at 3:49 AM, Thomas Eckardt <thomas.ecka...@thockar.com> wrote: > first: > > removing/moving files from the 'errors' corpus like you did, is one of the > most stupid things I ever saw! You gave assp an apoplexy! > > >How does it know what will be in spam and notspam. > > ASSP knows what was done in the rebuild task in the past and how it was > done. > If you do mayor changes manualy in the corpus - remove the file > 'assp/normfile' and run the rebuild twice. The first to get a clue about > what has been changed, the second to get the best detection rate. > > Thomas > > > > > > Von: K Post <nntp.p...@gmail.com> > An: ASSP development mailing list <assp-test@lists.sourceforge.net> > Datum: 30.04.2015 04:10 > Betreff: [Assp-test] Rebuild not parsing everything > > > > I was seeing a very fast parsing of notspam, I believe because there was a > lot of errors-notspam, but much less errors-spam. > > As a test, I moved all errors-spam to spam and errors-notspam to notspam, > leaving 0 error reports. > > Here's the result of the rebuild: > > > Apr-29-15 21:43:55 RebuildSpamDB-thread rebuildspamdb-version 7.10 started > in ASSP version 2.4.4(15117) > > Apr-29-15 21:43:55 rebuild debug output is enabled to > c:/assp/rebuilddebug.txt > > Apr-29-15 21:43:55 RebuildSpamDB uses BerkeleyDB for temporary hashes > > Apr-29-15 21:43:55 RebuildSpamDB uses BerkeleyDB-ENV with 62.50 MByte > > Apr-29-15 21:43:55 RebuildSpamDB will create a Hidden Markov Model! > > Apr-29-15 21:43:55 RebuildSpamDB will create unicode enabled databases. > > Apr-29-15 21:43:55 RebuildSpamDB will process all words as Sequence of UAX > #29 Grapheme Clusters. > > Apr-29-15 21:43:55 RebuildSpamDB will normalize unicode characters. > > Apr-29-15 21:43:55 RebuildSpamDB will use the ASSP_WordStem engine. > > Apr-29-15 21:43:55 ---ASSP Settings--- > Apr-29-15 21:43:55 Do Not Collect Messages with RedListed address: Enabled > **Messages with RedListed addresses will be removed from the corpus!** > > Apr-29-15 21:43:55 Do Not Collect RedRe Messages: Enabled **Messages > matching the RedRe will be removed from the corpus!** > > Apr-29-15 21:43:55 Use Subject as Maillog Names: True > Apr-29-15 21:43:55 Maxbytes: 2,500 > Apr-29-15 21:43:55 RebuildFileTimeLimit: 1 5 > Apr-29-15 21:43:55 RebuildFileTimeLimit: files will be moved away from the > corpus, if their processing takes longer than 5 second(s) > > Apr-29-15 21:44:02 Trashlist cleaning finished, 0 of 23606 files deleted > > Apr-29-15 21:44:02 c:/assp/messages/errors-spam > Apr-29-15 21:44:02 File Count: 0 > Apr-29-15 21:44:02 Processing... messages/errors-spam with 0 files > Apr-29-15 21:44:02 Imported Files for HeloBlackList: 0 > Apr-29-15 21:44:02 Imported Files for Bayes/HMM: 0 > Apr-29-15 21:44:02 Finished in 1 second(s) > > Apr-29-15 21:44:02 c:/assp/messages/errors-notspam > Apr-29-15 21:44:02 File Count: 0 > Apr-29-15 21:44:02 Processing... messages/errors-notspam with 0 files > Apr-29-15 21:44:02 Imported Files for HeloBlackList: 0 > Apr-29-15 21:44:02 Imported Files for Bayes/HMM: 0 > Apr-29-15 21:44:02 Finished in 1 second(s) > Apr-29-15 21:44:02 info: corpusnorm after processing messages/errors-spam > and messages/errors-notspam is Spam Weight: 0 / Not-Spam Weight: 0 => > norm: > 1.000 > Apr-29-15 21:44:02 info: require apx. 2,812 files (360,000 words) from > folder messages/spam to get the wanted corpusnorm (1.000) > > Apr-29-15 21:44:02 c:/assp/messages/spam > Apr-29-15 21:44:02 File Count: 18,219 > Apr-29-15 21:44:02 Processing... messages/spam with 15,000 files > Apr-29-15 21:47:49 Imported Files for HeloBlackList: 15,000 > Apr-29-15 21:47:49 Imported Files for Bayes/HMM: 1,888 > Apr-29-15 21:47:49 Finished in 227 second(s) > Apr-29-15 21:47:49 info: require apx. all files (360,036 words) from > folder > messages/notspam to get the wanted corpusnorm (1.000) > > Apr-29-15 21:47:49 c:/assp/messages/notspam > Apr-29-15 21:47:49 File Count: 21,197 > Apr-29-15 21:47:49 Processing... messages/notspam with 15,000 files > Apr-29-15 21:52:06 Imported Files for HeloBlackList: 15,000 > Apr-29-15 21:52:06 Imported Files for Bayes/HMM: 1,040 > Apr-29-15 21:52:06 Finished in 257 second(s) > > Apr-29-15 21:52:06 Generating weighted Bayesian tuplets > Apr-29-15 21:52:10 start populating Spamdb with 27,082 records - Bayesian > check is now disabled! > Apr-29-15 21:52:24 Finished populating Spamdb with 27,082 records - > Bayesian check is now enabled! > Apr-29-15 21:52:24 done - Generating weighted Bayesian tuplets > > Apr-29-15 21:52:24 Bayesian Pairs: 27,082 now in list > > Apr-29-15 21:52:24 Generating consolidated Hidden-Markov-Model database > from 527,319 record model > Apr-29-15 21:52:46 HMM sequences: 259,284 now in list > > Apr-29-15 21:52:46 generating Spamdb.helo records from 5,112 collected > HELO's > Apr-29-15 21:52:47 cleaning old Spamdb.helo records > Apr-29-15 21:52:52 done - cleaning old Spamdb.helo records > > Apr-29-15 21:52:52 HELO Blacklist: 12 new, 427 now in list > > Apr-29-15 21:52:52 Spam Weight: 360,036 > Apr-29-15 21:52:52 Not-Spam Weight: 360,070 > > Apr-29-15 21:52:52 Corpus norm: 0.9999 - (very good - balanced) > Apr-29-15 21:52:52 Corpus confidence: 1.00000000 > > Apr-29-15 21:52:57 Start populating Hidden Markov Model. HMM-check is > disabled for this time! > Apr-29-15 21:53:01 start populating Hidden Markov Model with 259,284 > records! > Apr-29-15 21:53:06 Finished populating Hidden Markov Model with 259,284 > records! > Apr-29-15 21:53:06 Finished populating Hidden Markov Model. HMM-check is > now enabled again! > > Apr-29-15 21:53:06 Total processing time: 551 second(s) > > Apr-29-15 21:53:06 Total processing data: 95.49 MByte > > > Apr-29-15 21:53:06 Rebuild processed 61.73 files per second. > > Apr-29-15 21:53:06 After finishing the Rebuild process, the c:/assp/tmpDB > folder contains 101.74 MByte. > > Apr-29-15 21:53:06 After finishing the Rebuild process, the drive that > contains the c:/assp/tmpDB folder has 20.17 GByte free space from total > 25.20 GByte. > > > > Why after processing errors-spam and errors-notspam does it say: > Apr-29-15 21:44:02 info: require apx. 2,812 files (360,000 words) from > folder messages/spam to get the wanted corpusnorm (1.000) > > > How does it know what will be in spam and notspam. Shouldn't it parse all > and then decide??? Based on the fast 4 minute scan time of each spam and > not spam, I'm guessing it's not looking at all files. is that normal? > Seems like a really small spamdb and hmm given 30k files (Even with only > the first 2.5kb being looked at) > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > > > > > > > DISCLAIMER: > ******************************************************* > This email and any files transmitted with it may be confidential, legally > privileged and protected in law and are intended solely for the use of the > > individual to whom it is addressed. > This email was multiple times scanned for viruses. There should be no > known virus in this email! > ******************************************************* > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test