Thanks for chiming in Thomas!  I know how busy you must be, and I (always)
appreciate the personal attention.

ASSP and MySQL (I ditched MSSQL) are running on the same Windows Hyper-V
VM.  It's got 16gb ram assigned to the VM, SAS storage (7200rpm) Raid 0+1 -
4 drives.  The machine seems good and speedy. There's not an awful lot more
running on the Hyper-V host.  THe host is a dual processor 6 core each Xeon
processor with 72gb of RAM.  6 processors are assigned to this hyper-v
guest.

I'm now using MySQL for the SpamDB and Berkeley DB for HMM as per your
recommendation.
I've made c:\assp\tmpDB be a 5gb RAM disk.

After last night's rebuild:
Apr-24-15 01:55:18 Rebuild processed 7.70 files per second. Good values are
10 files per second and higher. You can speed up the rebuild process, using
a cached (>=128MB) IO-controller or a RAM-disk with at least 2.30 GByte for
the folder 'C:/assp/tmpDB'.
better - but nowhere near the 9200 you're seeing!

I connect to ASSP using 127.0.0.1.  It is also setup with a named pipe, but
I have no idea how to use that - nor do I know if it would be
better/faster.  it's a default MySQL x64 install.  I haven't tweaked any
settings.  All ears there if things should be changed.

It takes about an hour and a half to go through my Messages db.  15k
not-spam, 14k spam, 5k error-spam, 5k error-notspam (about 40,000 message
total).  I don't see a way to speed that up.   MaxBytes is 4000, but I
store the complete mail for resending purposes.  Could that be causing
problems?


SpamDB generation is taking about 45 minutes.  I have a lot more records
than you.  Any idea why I have more than a million records, but you have
under 400,000?
Apr-24-15 00:59:21 Generating weighted Bayesian tuplets
Apr-24-15 01:00:16 start populating Spamdb with 1,178,146 records -
Bayesian check is now disabled!
Apr-24-15 01:46:32 Finished populating Spamdb with 1,178,146 records -
Bayesian check is now enabled!
Apr-24-15 01:46:32 done - Generating weighted Bayesian tuplets

Interestingly, HMM (using BerkeleyDB) is much faster
Apr-24-15 01:46:32 Generating consolidated Hidden-Markov-Model database
from 8,998,094 record model
Apr-24-15 01:53:29 HMM sequences: 4,382,780 now in list

Apr-24-15 01:53:29 generating Spamdb.helo records from 12,239 collected
HELO's
Apr-24-15 01:53:36 cleaning old Spamdb.helo records
Apr-24-15 01:53:36 done - cleaning old Spamdb.helo records

Apr-24-15 01:53:36 HELO Blacklist: 832 new, 3,505 now in list

Apr-24-15 01:53:36 Spam Weight:   10,516,190
Apr-24-15 01:53:36 Not-Spam Weight:   15,528,732

Apr-24-15 01:53:36 Corpus norm: 0.6772 - (ok - slighly ham heavy)
Apr-24-15 01:53:36 Corpus confidence: 0.43204189

Apr-24-15 01:53:41 Start populating Hidden Markov Model. HMM-check is
disabled for this time!
Apr-24-15 01:53:42 start populating Hidden Markov Model with 4,382,780
records!
Apr-24-15 01:55:18 Finished populating Hidden Markov Model with 4,382,780
records!
Apr-24-15 01:55:18 Finished populating Hidden Markov Model. HMM-check is
now enabled again!

Apr-24-15 01:55:18 Total processing time: 8,673 second(s)

Apr-24-15 01:55:18 Total processing data: 328.72 MByte


So - any thoughts based on this info?  Is there more info that I can
provide that would be helpful?

THANK YOU
Ken





On Fri, Apr 24, 2015 at 1:01 AM, Thomas Eckardt <thomas.ecka...@thockar.com>
wrote:

> >My assumption would be that is the estimate of the number of seconds it
> will take the process to complete.
>
> That's right.
>
> On a very slow system - Apr-24-15 04:34:03 Rebuild processed 5.65 files
> per second.
>
> ASSP and the MySQL DB running on the same system.
>
> Spamdb takes 42 seconds (9220 records per second)
>
> Apr-24-15 04:28:29 start populating Spamdb with 387,637 records - Bayesian
> check is now disabled!
> Apr-24-15 04:29:11 Finished populating Spamdb with 387,637 records -
> Bayesian check is now enabled!
> Apr-24-15 04:29:11 done - Generating weighted Bayesian tuplets
>
> HMMdb takes 140 seconds (10318 records per second)
>
> Apr-24-15 04:31:42 Start populating Hidden Markov Model. HMM-check is
> disabled for this time!
> Apr-24-15 04:31:43 start populating Hidden Markov Model with 1,444,532
> records!
> Apr-24-15 04:34:03 Finished populating Hidden Markov Model with 1,444,532
> records!
> Apr-24-15 04:34:03 Finished populating Hidden Markov Model. HMM-check is
> now enabled again!
>
> Thomas
>
>
>
> Von:    Colin Waring <co...@dolphinict.co.uk>
> An:     ASSP development mailing list <assp-test@lists.sourceforge.net>
> Datum:  23.04.2015 17:50
> Betreff:        Re: [Assp-test] speed of adding records to spamdb table
>
>
>
> My assumption would be that is the estimate of the number of seconds it
> will take the process to complete.
>
> Our rebuild takes about 10 seconds to populate the database, you do need
> to do some network tuning and make sure your database is optimised for
> purpose, I can't help you with MS SQL though.
>
> All the best,
> Colin Waring.
>
> -----Original Message-----
> From: K Post [mailto:nntp.p...@gmail.com]
> Sent: 23 April 2015 15:25
> To: ASSP development mailing list
> Subject: [Assp-test] speed of adding records to spamdb table
>
> Working to get the rebuild process to complete.  Win32.  MS SQL DB.
>
> Are these speeds normal??  I'm a little confused to the "sec" numbers,
> It's not 5,000+ seconds, I don't think. and I don't know why secs would be
> decreasing.  confused.
>
> Apr-23-15 10:21:37 Added 176152 of 998035 records for table spamdb -
> finished in 5081 sec
> Apr-23-15 10:21:38 Added 176346 of 998035 records for table spamdb -
> finished in 5078 sec
> Apr-23-15 10:21:40 Added 176540 of 998035 records for table spamdb -
> finished in 5081 sec
> Apr-23-15 10:21:42 Added 176928 of 998035 records for table spamdb -
> finished in 5077 sec
> Apr-23-15 10:21:47 Added 177704 of 998035 records for table spamdb -
> finished in 5073 sec
> Apr-23-15 10:21:48 Added 177785 of 998035 records for table spamdb -
> finished in 5075 sec
> Apr-23-15 10:21:49 Added 177940 of 998035 records for table spamdb -
> finished in 5074 sec
> Apr-23-15 10:21:53 Added 178560 of 998035 records for table spamdb -
> finished in 5071 sec
> Apr-23-15 10:21:55 Added 178870 of 998035 records for table spamdb -
> finished in 5069 sec
> Apr-23-15 10:21:57 Added 179180 of 998035 records for table spamdb -
> finished in 5068 sec
> Apr-23-15 10:21:59 Added 179490 of 998035 records for table spamdb -
> finished in 5066 sec
> Apr-23-15 10:22:01 Added 179800 of 998035 records for table spamdb -
> finished in 5065 sec
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your
> own process in accordance with the BPMN 2 standard Learn Process modeling
> best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
>
>
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
>
>
>
>
>
> DISCLAIMER:
> *******************************************************
> This email and any files transmitted with it may be confidential, legally
> privileged and protected in law and are intended solely for the use of the
>
> individual to whom it is addressed.
> This email was multiple times scanned for viruses. There should be no
> known virus in this email!
> *******************************************************
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to