Thankfully it's only a test installation.  Won't do that again!!!  It's
always my goal to stay off of your stupidest list, but I appreciate your
candor when you give me the "honor."

The subsequent rebuild gave a perfect 1.00

On Thu, Apr 30, 2015 at 3:49 AM, Thomas Eckardt <thomas.ecka...@thockar.com>
wrote:

> first:
>
> removing/moving files from the 'errors' corpus like you did, is one of the
> most stupid things I ever saw! You gave assp an apoplexy!
>
> >How does it know what will be in spam and notspam.
>
> ASSP knows what was done in the rebuild task in the past and how it was
> done.
> If you do mayor changes manualy in the corpus - remove the file
> 'assp/normfile' and run the rebuild twice. The first to get a clue about
> what has been changed, the second to get the best detection rate.
>
> Thomas
>
>
>
>
>
> Von:    K Post <nntp.p...@gmail.com>
> An:     ASSP development mailing list <assp-test@lists.sourceforge.net>
> Datum:  30.04.2015 04:10
> Betreff:        [Assp-test] Rebuild not parsing everything
>
>
>
> I was seeing a very fast parsing of notspam, I believe because there was a
> lot of errors-notspam, but much less errors-spam.
>
> As a test, I moved all errors-spam to spam and errors-notspam to notspam,
> leaving 0 error reports.
>
> Here's the result of the rebuild:
>
>
> Apr-29-15 21:43:55 RebuildSpamDB-thread rebuildspamdb-version 7.10 started
> in ASSP version 2.4.4(15117)
>
> Apr-29-15 21:43:55 rebuild debug output is enabled to
> c:/assp/rebuilddebug.txt
>
> Apr-29-15 21:43:55 RebuildSpamDB uses BerkeleyDB for temporary hashes
>
> Apr-29-15 21:43:55 RebuildSpamDB uses BerkeleyDB-ENV with 62.50 MByte
>
> Apr-29-15 21:43:55 RebuildSpamDB will create a Hidden Markov Model!
>
> Apr-29-15 21:43:55 RebuildSpamDB will create unicode enabled databases.
>
> Apr-29-15 21:43:55 RebuildSpamDB will process all words as Sequence of UAX
> #29 Grapheme Clusters.
>
> Apr-29-15 21:43:55 RebuildSpamDB will normalize unicode characters.
>
> Apr-29-15 21:43:55 RebuildSpamDB will use the ASSP_WordStem engine.
>
> Apr-29-15 21:43:55 ---ASSP Settings---
> Apr-29-15 21:43:55 Do Not Collect Messages with RedListed address: Enabled
> **Messages with RedListed addresses will be removed from the corpus!**
>
> Apr-29-15 21:43:55 Do Not Collect RedRe Messages: Enabled **Messages
> matching the RedRe will be removed from the corpus!**
>
> Apr-29-15 21:43:55 Use Subject as Maillog Names: True
> Apr-29-15 21:43:55 Maxbytes: 2,500
> Apr-29-15 21:43:55 RebuildFileTimeLimit: 1 5
> Apr-29-15 21:43:55 RebuildFileTimeLimit: files will be moved away from the
> corpus, if their processing takes longer than 5 second(s)
>
> Apr-29-15 21:44:02 Trashlist cleaning finished, 0 of 23606 files deleted
>
> Apr-29-15 21:44:02 c:/assp/messages/errors-spam
> Apr-29-15 21:44:02 File Count: 0
> Apr-29-15 21:44:02 Processing... messages/errors-spam with 0 files
> Apr-29-15 21:44:02 Imported Files for HeloBlackList: 0
> Apr-29-15 21:44:02 Imported Files for Bayes/HMM: 0
> Apr-29-15 21:44:02 Finished in 1 second(s)
>
> Apr-29-15 21:44:02 c:/assp/messages/errors-notspam
> Apr-29-15 21:44:02 File Count: 0
> Apr-29-15 21:44:02 Processing... messages/errors-notspam with 0 files
> Apr-29-15 21:44:02 Imported Files for HeloBlackList: 0
> Apr-29-15 21:44:02 Imported Files for Bayes/HMM: 0
> Apr-29-15 21:44:02 Finished in 1 second(s)
> Apr-29-15 21:44:02 info: corpusnorm after processing messages/errors-spam
> and messages/errors-notspam is Spam Weight: 0 / Not-Spam Weight: 0 =>
> norm:
> 1.000
> Apr-29-15 21:44:02 info: require apx. 2,812 files (360,000 words) from
> folder messages/spam to get the wanted corpusnorm (1.000)
>
> Apr-29-15 21:44:02 c:/assp/messages/spam
> Apr-29-15 21:44:02 File Count: 18,219
> Apr-29-15 21:44:02 Processing... messages/spam with 15,000 files
> Apr-29-15 21:47:49 Imported Files for HeloBlackList: 15,000
> Apr-29-15 21:47:49 Imported Files for Bayes/HMM: 1,888
> Apr-29-15 21:47:49 Finished in 227 second(s)
> Apr-29-15 21:47:49 info: require apx. all files (360,036 words) from
> folder
> messages/notspam to get the wanted corpusnorm (1.000)
>
> Apr-29-15 21:47:49 c:/assp/messages/notspam
> Apr-29-15 21:47:49 File Count: 21,197
> Apr-29-15 21:47:49 Processing... messages/notspam with 15,000 files
> Apr-29-15 21:52:06 Imported Files for HeloBlackList: 15,000
> Apr-29-15 21:52:06 Imported Files for Bayes/HMM: 1,040
> Apr-29-15 21:52:06 Finished in 257 second(s)
>
> Apr-29-15 21:52:06 Generating weighted Bayesian tuplets
> Apr-29-15 21:52:10 start populating Spamdb with 27,082 records - Bayesian
> check is now disabled!
> Apr-29-15 21:52:24 Finished populating Spamdb with 27,082 records -
> Bayesian check is now enabled!
> Apr-29-15 21:52:24 done - Generating weighted Bayesian tuplets
>
> Apr-29-15 21:52:24 Bayesian Pairs: 27,082 now in list
>
> Apr-29-15 21:52:24 Generating consolidated Hidden-Markov-Model database
> from 527,319 record model
> Apr-29-15 21:52:46 HMM sequences: 259,284 now in list
>
> Apr-29-15 21:52:46 generating Spamdb.helo records from 5,112 collected
> HELO's
> Apr-29-15 21:52:47 cleaning old Spamdb.helo records
> Apr-29-15 21:52:52 done - cleaning old Spamdb.helo records
>
> Apr-29-15 21:52:52 HELO Blacklist: 12 new, 427 now in list
>
> Apr-29-15 21:52:52 Spam Weight:   360,036
> Apr-29-15 21:52:52 Not-Spam Weight:   360,070
>
> Apr-29-15 21:52:52 Corpus norm: 0.9999 - (very good - balanced)
> Apr-29-15 21:52:52 Corpus confidence: 1.00000000
>
> Apr-29-15 21:52:57 Start populating Hidden Markov Model. HMM-check is
> disabled for this time!
> Apr-29-15 21:53:01 start populating Hidden Markov Model with 259,284
> records!
> Apr-29-15 21:53:06 Finished populating Hidden Markov Model with 259,284
> records!
> Apr-29-15 21:53:06 Finished populating Hidden Markov Model. HMM-check is
> now enabled again!
>
> Apr-29-15 21:53:06 Total processing time: 551 second(s)
>
> Apr-29-15 21:53:06 Total processing data: 95.49 MByte
>
>
> Apr-29-15 21:53:06 Rebuild processed 61.73 files per second.
>
> Apr-29-15 21:53:06 After finishing the Rebuild process, the c:/assp/tmpDB
> folder contains 101.74 MByte.
>
> Apr-29-15 21:53:06 After finishing the Rebuild process, the drive that
> contains the c:/assp/tmpDB folder has 20.17 GByte free space from total
> 25.20 GByte.
>
>
>
> Why after processing errors-spam and errors-notspam does it say:
> Apr-29-15 21:44:02 info: require apx. 2,812 files (360,000 words) from
> folder messages/spam to get the wanted corpusnorm (1.000)
>
>
> How does it know what will be in spam and notspam.  Shouldn't it parse all
> and then decide???  Based on the fast 4 minute scan time of each spam and
> not spam, I'm guessing it's not looking at all files.  is that normal?
>  Seems like a really small spamdb and hmm given 30k files (Even with only
> the first 2.5kb being looked at)
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
>
>
>
>
>
> DISCLAIMER:
> *******************************************************
> This email and any files transmitted with it may be confidential, legally
> privileged and protected in law and are intended solely for the use of the
>
> individual to whom it is addressed.
> This email was multiple times scanned for viruses. There should be no
> known virus in this email!
> *******************************************************
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to