Thanks for chiming in Thomas! I know how busy you must be, and I (always) appreciate the personal attention.
ASSP and MySQL (I ditched MSSQL) are running on the same Windows Hyper-V VM. It's got 16gb ram assigned to the VM, SAS storage (7200rpm) Raid 0+1 - 4 drives. The machine seems good and speedy. There's not an awful lot more running on the Hyper-V host. THe host is a dual processor 6 core each Xeon processor with 72gb of RAM. 6 processors are assigned to this hyper-v guest. I'm now using MySQL for the SpamDB and Berkeley DB for HMM as per your recommendation. I've made c:\assp\tmpDB be a 5gb RAM disk. After last night's rebuild: Apr-24-15 01:55:18 Rebuild processed 7.70 files per second. Good values are 10 files per second and higher. You can speed up the rebuild process, using a cached (>=128MB) IO-controller or a RAM-disk with at least 2.30 GByte for the folder 'C:/assp/tmpDB'. better - but nowhere near the 9200 you're seeing! I connect to ASSP using 127.0.0.1. It is also setup with a named pipe, but I have no idea how to use that - nor do I know if it would be better/faster. it's a default MySQL x64 install. I haven't tweaked any settings. All ears there if things should be changed. It takes about an hour and a half to go through my Messages db. 15k not-spam, 14k spam, 5k error-spam, 5k error-notspam (about 40,000 message total). I don't see a way to speed that up. MaxBytes is 4000, but I store the complete mail for resending purposes. Could that be causing problems? SpamDB generation is taking about 45 minutes. I have a lot more records than you. Any idea why I have more than a million records, but you have under 400,000? Apr-24-15 00:59:21 Generating weighted Bayesian tuplets Apr-24-15 01:00:16 start populating Spamdb with 1,178,146 records - Bayesian check is now disabled! Apr-24-15 01:46:32 Finished populating Spamdb with 1,178,146 records - Bayesian check is now enabled! Apr-24-15 01:46:32 done - Generating weighted Bayesian tuplets Interestingly, HMM (using BerkeleyDB) is much faster Apr-24-15 01:46:32 Generating consolidated Hidden-Markov-Model database from 8,998,094 record model Apr-24-15 01:53:29 HMM sequences: 4,382,780 now in list Apr-24-15 01:53:29 generating Spamdb.helo records from 12,239 collected HELO's Apr-24-15 01:53:36 cleaning old Spamdb.helo records Apr-24-15 01:53:36 done - cleaning old Spamdb.helo records Apr-24-15 01:53:36 HELO Blacklist: 832 new, 3,505 now in list Apr-24-15 01:53:36 Spam Weight: 10,516,190 Apr-24-15 01:53:36 Not-Spam Weight: 15,528,732 Apr-24-15 01:53:36 Corpus norm: 0.6772 - (ok - slighly ham heavy) Apr-24-15 01:53:36 Corpus confidence: 0.43204189 Apr-24-15 01:53:41 Start populating Hidden Markov Model. HMM-check is disabled for this time! Apr-24-15 01:53:42 start populating Hidden Markov Model with 4,382,780 records! Apr-24-15 01:55:18 Finished populating Hidden Markov Model with 4,382,780 records! Apr-24-15 01:55:18 Finished populating Hidden Markov Model. HMM-check is now enabled again! Apr-24-15 01:55:18 Total processing time: 8,673 second(s) Apr-24-15 01:55:18 Total processing data: 328.72 MByte So - any thoughts based on this info? Is there more info that I can provide that would be helpful? THANK YOU Ken On Fri, Apr 24, 2015 at 1:01 AM, Thomas Eckardt <thomas.ecka...@thockar.com> wrote: > >My assumption would be that is the estimate of the number of seconds it > will take the process to complete. > > That's right. > > On a very slow system - Apr-24-15 04:34:03 Rebuild processed 5.65 files > per second. > > ASSP and the MySQL DB running on the same system. > > Spamdb takes 42 seconds (9220 records per second) > > Apr-24-15 04:28:29 start populating Spamdb with 387,637 records - Bayesian > check is now disabled! > Apr-24-15 04:29:11 Finished populating Spamdb with 387,637 records - > Bayesian check is now enabled! > Apr-24-15 04:29:11 done - Generating weighted Bayesian tuplets > > HMMdb takes 140 seconds (10318 records per second) > > Apr-24-15 04:31:42 Start populating Hidden Markov Model. HMM-check is > disabled for this time! > Apr-24-15 04:31:43 start populating Hidden Markov Model with 1,444,532 > records! > Apr-24-15 04:34:03 Finished populating Hidden Markov Model with 1,444,532 > records! > Apr-24-15 04:34:03 Finished populating Hidden Markov Model. HMM-check is > now enabled again! > > Thomas > > > > Von: Colin Waring <co...@dolphinict.co.uk> > An: ASSP development mailing list <assp-test@lists.sourceforge.net> > Datum: 23.04.2015 17:50 > Betreff: Re: [Assp-test] speed of adding records to spamdb table > > > > My assumption would be that is the estimate of the number of seconds it > will take the process to complete. > > Our rebuild takes about 10 seconds to populate the database, you do need > to do some network tuning and make sure your database is optimised for > purpose, I can't help you with MS SQL though. > > All the best, > Colin Waring. > > -----Original Message----- > From: K Post [mailto:nntp.p...@gmail.com] > Sent: 23 April 2015 15:25 > To: ASSP development mailing list > Subject: [Assp-test] speed of adding records to spamdb table > > Working to get the rebuild process to complete. Win32. MS SQL DB. > > Are these speeds normal?? I'm a little confused to the "sec" numbers, > It's not 5,000+ seconds, I don't think. and I don't know why secs would be > decreasing. confused. > > Apr-23-15 10:21:37 Added 176152 of 998035 records for table spamdb - > finished in 5081 sec > Apr-23-15 10:21:38 Added 176346 of 998035 records for table spamdb - > finished in 5078 sec > Apr-23-15 10:21:40 Added 176540 of 998035 records for table spamdb - > finished in 5081 sec > Apr-23-15 10:21:42 Added 176928 of 998035 records for table spamdb - > finished in 5077 sec > Apr-23-15 10:21:47 Added 177704 of 998035 records for table spamdb - > finished in 5073 sec > Apr-23-15 10:21:48 Added 177785 of 998035 records for table spamdb - > finished in 5075 sec > Apr-23-15 10:21:49 Added 177940 of 998035 records for table spamdb - > finished in 5074 sec > Apr-23-15 10:21:53 Added 178560 of 998035 records for table spamdb - > finished in 5071 sec > Apr-23-15 10:21:55 Added 178870 of 998035 records for table spamdb - > finished in 5069 sec > Apr-23-15 10:21:57 Added 179180 of 998035 records for table spamdb - > finished in 5068 sec > Apr-23-15 10:21:59 Added 179490 of 998035 records for table spamdb - > finished in 5066 sec > Apr-23-15 10:22:01 Added 179800 of 998035 records for table spamdb - > finished in 5065 sec > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your > own process in accordance with the BPMN 2 standard Learn Process modeling > best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > > > > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > > > > > > > DISCLAIMER: > ******************************************************* > This email and any files transmitted with it may be confidential, legally > privileged and protected in law and are intended solely for the use of the > > individual to whom it is addressed. > This email was multiple times scanned for viruses. There should be no > known virus in this email! > ******************************************************* > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test