and why would the rebuild of hmm in berkeleydb take only seconds, but the spamdb in mysql (on same box) take 45 minutes?
On Tue, Apr 28, 2015 at 9:28 AM, K Post <nntp.p...@gmail.com> wrote: > preventBulkImport is not checked. > > I've reinstalled the VM from scratch. New OS installation, using the perl > distribution 5.20 from > http://sourceforge.net/projects/assp/files/ASSP%20V2%20multithreading/ASSP%20V2%20module%20installation/ > > Parsing the files, I'm talking about Apr-28-15 02:14:20 Processing... > messages/notspam with 14,759 files: > I'm worried that just parsing through the 40k files is about 65% slower > than it is on the old production box using the same corpus (copied to the > dev machine) even though the old box is less than 1/2 the processing power, > has 40% slower disks, and 1/4 the RAM. That very old installation doesn't > have HMM in the code, yes it's that old. When rb_processfolder runs in the > latest version, is it doing more processing of each file because of the HMM > option? I can't imagine why it would take so much longer on the new > faster hardware. Any temporary code modifications I can make to see what's > taking so long? > > Is there a spot in code where I could also modify bulk import of spamdb > during the rebuild? I'd like to see if I can modify that as a test to > write the import script as a file, ultimately to test how long it takes to > import. Or any suggestions on timing this would be great. > > I'm really struggling here, thanks for the help. > > > On Tue, Apr 28, 2015 at 4:19 AM, Thomas Eckardt < > thomas.ecka...@thockar.com> wrote: > >> populating the SpamDB and HMMdbis a "DB Import". Check that >> 'preventBulkImport' is disabled! >> >> Thomas >> >> >> >> >> >> Von: K Post <nntp.p...@gmail.com> >> An: ASSP development mailing list <assp-test@lists.sourceforge.net> >> Datum: 27.04.2015 20:32 >> Betreff: [Assp-test] MySQL vs BerkeleyDB >> >> >> >> Hi all- >> >> I'm having a rough go getting the rebuild process to quickly rebuild >> spamdb. The HMM db, which I have using BerkeleyDB rebuilds wonderfully, >> in >> under a minute. However, spamdb, which uses MySQL, is taking over 45 >> minutes. That's no good. >> >> The real question is if there is a downside for using BerkeleyDB for >> everything? >> >> In reality, I'd like to figure out why my installation is taking so slow >> with MySQL (and I've got another stalled out thread going on that). I >> worry about the lack of management tools with BerkeleyDB. I'd be >> uncomfortable with the whitelist being in Berkeley. >> >> >> More info: >> >> ASSP and MySQL are running on the same Windows 2012 hypver-v virtual >> machine. 16gb ram. 4gb ram disk for c:/assp/tmpDB (using the imdisk >> driver), The vm seems to be running quickly for all other tasks. >> >> I've got a corpus of around 15k spam, 15k not spam, and 5k errors for each >> of error-spam and error-notspam (so about 40k total). It takes about 45 >> minutes to go through all of these messages and I'm okay with that >> >> MySQL is using the setting suggested here: >> http://sourceforge.net/p/assp/mailman/message/29893302/ by Thomas, >> though net_buffer_length >> is limited to 1M according to the documentation. >> >> Apr-27-15 13:23:47 start populating Spamdb with 1,140,905 records - >> Bayesian check is now disabled! >> Apr-27-15 14:07:09 Finished populating Spamdb with 1,140,905 records - >> Bayesian check is now enabled! >> >> >> I'd really like to stick with MySQL for spamdb and the other databases, >> but >> berkeleydb as recommended for HMM. I just can't see doing that if the >> rebuild of spamdb will be so slow. >> >> What kind of speeds is everyone else seeing for the spamdb rebuild portion >> of the rebuild? >> >> I'd love some suggestions on speeding up MySQL or anything else. Thank >> you >> >> Ken >> >> ------------------------------------------------------------------------------ >> One dashboard for servers and applications across Physical-Virtual-Cloud >> Widest out-of-the-box monitoring support with 50+ applications >> Performance metrics, stats and reports that give you Actionable Insights >> Deep dive visibility with transaction tracing using APM Insight. >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> _______________________________________________ >> Assp-test mailing list >> Assp-test@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/assp-test >> >> >> >> >> >> >> DISCLAIMER: >> ******************************************************* >> This email and any files transmitted with it may be confidential, legally >> privileged and protected in law and are intended solely for the use of the >> >> individual to whom it is addressed. >> This email was multiple times scanned for viruses. There should be no >> known virus in this email! >> ******************************************************* >> >> >> ------------------------------------------------------------------------------ >> One dashboard for servers and applications across Physical-Virtual-Cloud >> Widest out-of-the-box monitoring support with 50+ applications >> Performance metrics, stats and reports that give you Actionable Insights >> Deep dive visibility with transaction tracing using APM Insight. >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> _______________________________________________ >> Assp-test mailing list >> Assp-test@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/assp-test >> > > ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test