Obviously my idea isn't new, but I raised it because I thought it might be a good idea. You've confirmed that it IS a good idea, but not realistic and not necessary since it's only one feature disabled at a time. Now that you've got my rebuild working properly (HUGE THANK YOU), it's really a moot point.
On Thu, Apr 30, 2015 at 5:49 AM, Thomas Eckardt <thomas.ecka...@thockar.com> wrote: > >And last > >Have you given any thought into using a temporary table for this? You > >could populate a table called SpamDB.rebuilding or something without > having > >to disable bayesian checks during the rebuild. Then quickly turn off > >bayesian, delete spamdb, rename spamdb.rebuilding. Could do the same > with > >HMM. > > Do you really think this idea is NEW ? > > A major change to the DB engine code in the past, prevents to do it this > way. Every thread (worker) uses a single database connection for all Tied > tables. > To rename a DB table, you need exclusive access to it, so all worker would > have to untie all hashes, completely disconnect from the database and > would have to wait until the rename of the tables is finished and then > they would have to reconnect to the database and to Tie all tables to > hashes again. Until this procedure has been finished, the workers can't do > anything! > The complete procedure "untie - disconnect - drop table - rename table - > reconnect - tie" can take (depends on the configuration) more than one > minute for the SMTP workers and several minutes for the > Maintenance-Worker, if it is just doing any longer running task. > > Yes, what you suggest could be done! It will require several hundred or > thousand lines of code to synchronize all workers to a single point in > time and to do the stuff - but this makes NO sense to me! > The way it works now (having only one feature running at a time for some > minutes) is Pareto-optimal - and it works. > > There is a way to configure assp in an ISP-Cluster-Mode, where both spamdb > and hmmdb are hold in unshared memory in every worker. But this requires > expensive hardware (8 cores 16GB RAM minimum) for each cluster node. > > > Thomas > > > > > > > > > Von: K Post <nntp.p...@gmail.com> > An: ASSP development mailing list <assp-test@lists.sourceforge.net> > Datum: 29.04.2015 16:34 > Betreff: Re: [Assp-test] MySQL vs BerkeleyDB > > > > FANTASTIC! > > The change to > MySQL|*|NOOP|NOOP|$sql_sm="INSERT IGNORE INTO $mysqlTable VALUES > "|$sql_sm="($k,$v,\'0\')"|$sql_sm=","|1000 > > brought SpamDB generation down to under 3 minutes. > > Apr-29-15 09:41:41 Generating weighted Bayesian tuplets > Apr-29-15 09:42:16 start populating Spamdb with 970,294 records - Bayesian > check is now disabled! > Apr-29-15 09:44:29 Finished populating Spamdb with 970,294 records - > Bayesian check is now enabled! > Apr-29-15 09:44:29 done - Generating weighted Bayesian tuplets > > This is great, though I have no idea what your change is actually doing. > > 3 minutes isn't bad at all though, at least not relative to where we were > before. In another thread, you (Thomas) posted this: > Apr-24-15 04:28:29 start populating Spamdb with 387,637 records - Bayesian > check is now disabled! > Apr-24-15 04:29:11 Finished populating Spamdb with 387,637 records - > Bayesian check is now enabled! > Apr-24-15 04:29:11 done - Generating weighted Bayesian tuplets > that's about 9k a second > > Mine comes to around 5700 a second. Still slower than you're seeing. Are > you running MySQL Community 5.24? Any changes to the defaults? > > > And other questions: > 1) Is MaxBytes still recommended to be around 3000 when the corupus is > full? Seems to me like my spam messages in corpus are way bigger than 3k > on average. > 2) I have RebuildThreadCycleType set to the default of 30. Do you think I > should lower this number? Is it likely to improve rebuild time? > 3) How many messages do you have in the corpus? And how long is it taking > you to process each folder? I'm trying to gauge if I need to beg for > better hardware (charity, not easy). I'm seeing strange results > (consistent but strange in my opinion) > > Apr-29-15 08:58:54 Processing... messages/errors-spam with 5,679 files > Apr-29-15 09:10:59 Finished in 725 second(s) 0.12 secs per message > > Apr-29-15 09:10:59 Processing... messages/errors-notspam with 5,478 files > Apr-29-15 09:31:41 Finished in 1,242 second(s) 0.22 secs per message > > Apr-29-15 09:31:41 Processing... messages/spam with 11,805 files > Apr-29-15 09:40:07 Finished in 506 second(s) 0.04 secs per message (why > so much faster than errors-spam) > > Apr-29-15 09:40:07 Processing... messages/notspam with 11,760 files > Apr-29-15 09:41:41 Finished in 94 second(s) 0.008 secs per message - > WOW- why is this always SO fast relative to others. > > This is at 3k size. > > > And last > Have you given any thought into using a temporary table for this? You > could populate a table called SpamDB.rebuilding or something without > having > to disable bayesian checks during the rebuild. Then quickly turn off > bayesian, delete spamdb, rename spamdb.rebuilding. Could do the same with > HMM. > > > > On Wed, Apr 29, 2015 at 5:00 AM, Thomas Eckardt > <thomas.ecka...@thockar.com> > wrote: > > > Try the following: > > > > change the following line in the assp_db_import.cfg > > > > MySQL|*|NOOP|NOOP|$sql_sm="INSERT IGNORE INTO $mysqlTable VALUES > > "|$sql_sm="($k,$v,\'$f\')"|$sql_sm=","|1000 > > > > in to > > > > MySQL|*|NOOP|NOOP|$sql_sm="INSERT IGNORE INTO $mysqlTable VALUES > > "|$sql_sm="($k,$v,\'0\')"|$sql_sm=","|1000 > > > > it replaces $f with 0 - "($k,$v,\'$f\') -> "($k,$v,\'0\') > > > > This issue seems to depend on anything, which is currently unknown to > me. > > I'm just looking for the reason, but this change in assp_db_import.cfg > > will prevent skipping the Bulk-Import. > > The next build will fix the issue, that the original assp_db_import.cfg > > can be used. > > > > Thomas > > > > > > > > > > > > Von: K Post <nntp.p...@gmail.com> > > An: ASSP development mailing list <assp-test@lists.sourceforge.net> > > Datum: 28.04.2015 20:10 > > Betreff: Re: [Assp-test] MySQL vs BerkeleyDB > > > > > > > > Sorry for the seemingly incessant emails... > > > > Error: You have an error in your SQL syntax; check the manual that > > corresponds to your MySQL server version for the right syntax to use > near > > 'INSERT IGNORE INTO spamdb VALUES ,INSERT IGNORE INTO spamdb VALUES > > ,INSERT > > IGNOR' at line 1 > > > > Don't recall having seen this before. I'm now using assp_db_import.cfg > > straight from cvs, no edits. Do I need to edit this to use with mysql > > too? Or should I? looks like the maximum records for insert in bulk is > > 1000, should I change that? > > > > > > > > On Tue, Apr 28, 2015 at 10:15 AM, K Post <nntp.p...@gmail.com> wrote: > > > > > and note, looking periodically at the worker status window in the web > > > admin, I see "chkdb - finished" for quite some time after the 40k > files > > > have been processed. I think this is while spamdb is being generated. > > > > > > On Tue, Apr 28, 2015 at 9:29 AM, K Post <nntp.p...@gmail.com> wrote: > > > > > >> and why would the rebuild of hmm in berkeleydb take only seconds, but > > the > > >> spamdb in mysql (on same box) take 45 minutes? > > >> > > >> On Tue, Apr 28, 2015 at 9:28 AM, K Post <nntp.p...@gmail.com> wrote: > > >> > > >>> preventBulkImport is not checked. > > >>> > > >>> I've reinstalled the VM from scratch. New OS installation, using > the > > >>> perl distribution 5.20 from > > >>> > > > > > > http://sourceforge.net/projects/assp/files/ASSP%20V2%20multithreading/ASSP%20V2%20module%20installation/ > > > > > >>> > > >>> Parsing the files, I'm talking about Apr-28-15 02:14:20 > Processing... > > >>> messages/notspam with 14,759 files: > > >>> I'm worried that just parsing through the 40k files is about 65% > > slower > > >>> than it is on the old production box using the same corpus (copied > to > > the > > >>> dev machine) even though the old box is less than 1/2 the processing > > power, > > >>> has 40% slower disks, and 1/4 the RAM. That very old installation > > doesn't > > >>> have HMM in the code, yes it's that old. When rb_processfolder runs > > in the > > >>> latest version, is it doing more processing of each file because of > > the HMM > > >>> option? I can't imagine why it would take so much longer on the > new > > >>> faster hardware. Any temporary code modifications I can make to see > > what's > > >>> taking so long? > > >>> > > >>> Is there a spot in code where I could also modify bulk import of > > spamdb > > >>> during the rebuild? I'd like to see if I can modify that as a test > to > > >>> write the import script as a file, ultimately to test how long it > > takes to > > >>> import. Or any suggestions on timing this would be great. > > >>> > > >>> I'm really struggling here, thanks for the help. > > >>> > > >>> > > >>> On Tue, Apr 28, 2015 at 4:19 AM, Thomas Eckardt < > > >>> thomas.ecka...@thockar.com> wrote: > > >>> > > >>>> populating the SpamDB and HMMdbis a "DB Import". Check that > > >>>> 'preventBulkImport' is disabled! > > >>>> > > >>>> Thomas > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> Von: K Post <nntp.p...@gmail.com> > > >>>> An: ASSP development mailing list > > <assp-test@lists.sourceforge.net> > > >>>> Datum: 27.04.2015 20:32 > > >>>> Betreff: [Assp-test] MySQL vs BerkeleyDB > > >>>> > > >>>> > > >>>> > > >>>> Hi all- > > >>>> > > >>>> I'm having a rough go getting the rebuild process to quickly > rebuild > > >>>> spamdb. The HMM db, which I have using BerkeleyDB rebuilds > > wonderfully, > > >>>> in > > >>>> under a minute. However, spamdb, which uses MySQL, is taking over > 45 > > >>>> minutes. That's no good. > > >>>> > > >>>> The real question is if there is a downside for using BerkeleyDB > for > > >>>> everything? > > >>>> > > >>>> In reality, I'd like to figure out why my installation is taking so > > slow > > >>>> with MySQL (and I've got another stalled out thread going on that). > I > > >>>> worry about the lack of management tools with BerkeleyDB. I'd be > > >>>> uncomfortable with the whitelist being in Berkeley. > > >>>> > > >>>> > > >>>> More info: > > >>>> > > >>>> ASSP and MySQL are running on the same Windows 2012 hypver-v > virtual > > >>>> machine. 16gb ram. 4gb ram disk for c:/assp/tmpDB (using the > imdisk > > >>>> driver), The vm seems to be running quickly for all other tasks. > > >>>> > > >>>> I've got a corpus of around 15k spam, 15k not spam, and 5k errors > for > > >>>> each > > >>>> of error-spam and error-notspam (so about 40k total). It takes > about > > 45 > > >>>> minutes to go through all of these messages and I'm okay with that > > >>>> > > >>>> MySQL is using the setting suggested here: > > >>>> http://sourceforge.net/p/assp/mailman/message/29893302/ by Thomas, > > >>>> though net_buffer_length > > >>>> is limited to 1M according to the documentation. > > >>>> > > >>>> Apr-27-15 13:23:47 start populating Spamdb with 1,140,905 records - > > >>>> Bayesian check is now disabled! > > >>>> Apr-27-15 14:07:09 Finished populating Spamdb with 1,140,905 > records > > - > > >>>> Bayesian check is now enabled! > > >>>> > > >>>> > > >>>> I'd really like to stick with MySQL for spamdb and the other > > databases, > > >>>> but > > >>>> berkeleydb as recommended for HMM. I just can't see doing that if > > the > > >>>> rebuild of spamdb will be so slow. > > >>>> > > >>>> What kind of speeds is everyone else seeing for the spamdb rebuild > > >>>> portion > > >>>> of the rebuild? > > >>>> > > >>>> I'd love some suggestions on speeding up MySQL or anything else. > > Thank > > >>>> you > > >>>> > > >>>> Ken > > >>>> > > >>>> > > > > > > ------------------------------------------------------------------------------ > > >>>> One dashboard for servers and applications across > > Physical-Virtual-Cloud > > >>>> Widest out-of-the-box monitoring support with 50+ applications > > >>>> Performance metrics, stats and reports that give you Actionable > > Insights > > >>>> Deep dive visibility with transaction tracing using APM Insight. > > >>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > > >>>> _______________________________________________ > > >>>> Assp-test mailing list > > >>>> Assp-test@lists.sourceforge.net > > >>>> https://lists.sourceforge.net/lists/listinfo/assp-test > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> DISCLAIMER: > > >>>> ******************************************************* > > >>>> This email and any files transmitted with it may be confidential, > > >>>> legally > > >>>> privileged and protected in law and are intended solely for the use > > of > > >>>> the > > >>>> > > >>>> individual to whom it is addressed. > > >>>> This email was multiple times scanned for viruses. There should be > no > > >>>> known virus in this email! > > >>>> ******************************************************* > > >>>> > > >>>> > > >>>> > > > > > > ------------------------------------------------------------------------------ > > >>>> One dashboard for servers and applications across > > Physical-Virtual-Cloud > > >>>> Widest out-of-the-box monitoring support with 50+ applications > > >>>> Performance metrics, stats and reports that give you Actionable > > Insights > > >>>> Deep dive visibility with transaction tracing using APM Insight. > > >>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > > >>>> _______________________________________________ > > >>>> Assp-test mailing list > > >>>> Assp-test@lists.sourceforge.net > > >>>> https://lists.sourceforge.net/lists/listinfo/assp-test > > >>>> > > >>> > > >>> > > >> > > > > > > > > > ------------------------------------------------------------------------------ > > One dashboard for servers and applications across Physical-Virtual-Cloud > > Widest out-of-the-box monitoring support with 50+ applications > > Performance metrics, stats and reports that give you Actionable Insights > > Deep dive visibility with transaction tracing using APM Insight. > > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > > _______________________________________________ > > Assp-test mailing list > > Assp-test@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/assp-test > > > > > > > > > > > > > > DISCLAIMER: > > ******************************************************* > > This email and any files transmitted with it may be confidential, > legally > > privileged and protected in law and are intended solely for the use of > the > > > > individual to whom it is addressed. > > This email was multiple times scanned for viruses. There should be no > > known virus in this email! > > ******************************************************* > > > > > > > > ------------------------------------------------------------------------------ > > One dashboard for servers and applications across Physical-Virtual-Cloud > > Widest out-of-the-box monitoring support with 50+ applications > > Performance metrics, stats and reports that give you Actionable Insights > > Deep dive visibility with transaction tracing using APM Insight. > > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > > _______________________________________________ > > Assp-test mailing list > > Assp-test@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/assp-test > > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > > > > > > > DISCLAIMER: > ******************************************************* > This email and any files transmitted with it may be confidential, legally > privileged and protected in law and are intended solely for the use of the > > individual to whom it is addressed. > This email was multiple times scanned for viruses. There should be no > known virus in this email! > ******************************************************* > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Assp-test mailing list > Assp-test@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/assp-test > ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test