and why would the rebuild of hmm in berkeleydb take only seconds, but the
spamdb in mysql (on same box) take 45 minutes?

On Tue, Apr 28, 2015 at 9:28 AM, K Post <nntp.p...@gmail.com> wrote:

> preventBulkImport is not checked.
>
> I've reinstalled the VM from scratch.  New OS installation, using the perl
> distribution 5.20 from
> http://sourceforge.net/projects/assp/files/ASSP%20V2%20multithreading/ASSP%20V2%20module%20installation/
>
> Parsing the files, I'm talking about Apr-28-15 02:14:20 Processing...
> messages/notspam with 14,759 files:
> I'm worried that just parsing through the 40k files is about 65% slower
> than it is on the old production box using the same corpus (copied to the
> dev machine) even though the old box is less than 1/2 the processing power,
> has 40% slower disks, and 1/4 the RAM.  That very old installation doesn't
> have HMM in the code, yes it's that old.  When rb_processfolder runs in the
> latest version, is it doing more processing of each file because of the HMM
> option?   I can't imagine why it would take so much longer on the new
> faster hardware.  Any temporary code modifications I can make to see what's
> taking so long?
>
> Is there a spot in code where I could also modify bulk import of spamdb
> during the rebuild?  I'd like to see if I can modify that as a test to
> write the import script as a file, ultimately to test how long it takes to
> import. Or any suggestions on timing this would be great.
>
> I'm really struggling here, thanks for the help.
>
>
> On Tue, Apr 28, 2015 at 4:19 AM, Thomas Eckardt <
> thomas.ecka...@thockar.com> wrote:
>
>> populating the SpamDB and HMMdbis a  "DB Import". Check that
>> 'preventBulkImport' is disabled!
>>
>> Thomas
>>
>>
>>
>>
>>
>> Von:    K Post <nntp.p...@gmail.com>
>> An:     ASSP development mailing list <assp-test@lists.sourceforge.net>
>> Datum:  27.04.2015 20:32
>> Betreff:        [Assp-test] MySQL vs BerkeleyDB
>>
>>
>>
>> Hi all-
>>
>> I'm having a rough go getting the rebuild process to quickly rebuild
>> spamdb.  The HMM db, which I have using BerkeleyDB rebuilds wonderfully,
>> in
>> under a minute.  However, spamdb, which uses MySQL, is taking over 45
>> minutes.  That's no good.
>>
>> The real question is if there is a downside for using BerkeleyDB for
>> everything?
>>
>> In reality, I'd like to figure out why my installation is taking so slow
>> with MySQL (and I've got another stalled out thread going on that).  I
>> worry about the lack of management tools with BerkeleyDB.  I'd be
>> uncomfortable with the whitelist being in Berkeley.
>>
>>
>> More info:
>>
>> ASSP and MySQL are running on the same Windows 2012 hypver-v virtual
>> machine.  16gb ram.  4gb ram disk for c:/assp/tmpDB (using the imdisk
>> driver),  The vm seems to be running quickly for all other tasks.
>>
>> I've got a corpus of around 15k spam, 15k not spam, and 5k errors for each
>> of error-spam and error-notspam (so about 40k total).  It takes about 45
>> minutes to go through all of these messages and I'm okay with that
>>
>> MySQL is using the setting suggested here:
>> http://sourceforge.net/p/assp/mailman/message/29893302/ by Thomas,
>> though net_buffer_length
>> is limited to 1M according to the documentation.
>>
>> Apr-27-15 13:23:47 start populating Spamdb with 1,140,905 records -
>> Bayesian check is now disabled!
>> Apr-27-15 14:07:09 Finished populating Spamdb with 1,140,905 records -
>> Bayesian check is now enabled!
>>
>>
>> I'd really like to stick with MySQL for spamdb and the other databases,
>> but
>> berkeleydb as recommended for HMM.  I just can't see doing that if the
>> rebuild of spamdb will be so slow.
>>
>> What kind of speeds is everyone else seeing for the spamdb rebuild portion
>> of the rebuild?
>>
>> I'd love some suggestions on speeding up MySQL or anything else.  Thank
>> you
>>
>> Ken
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Assp-test mailing list
>> Assp-test@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/assp-test
>>
>>
>>
>>
>>
>>
>> DISCLAIMER:
>> *******************************************************
>> This email and any files transmitted with it may be confidential, legally
>> privileged and protected in law and are intended solely for the use of the
>>
>> individual to whom it is addressed.
>> This email was multiple times scanned for viruses. There should be no
>> known virus in this email!
>> *******************************************************
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Assp-test mailing list
>> Assp-test@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/assp-test
>>
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to