Obviously my idea isn't new, but I raised it because I thought it might be
a good idea.  You've confirmed that it IS a good idea, but not realistic
and not necessary since it's only one feature disabled at a time.  Now that
you've got my rebuild working properly (HUGE THANK YOU), it's really a moot
point.

On Thu, Apr 30, 2015 at 5:49 AM, Thomas Eckardt <thomas.ecka...@thockar.com>
wrote:

> >And last
> >Have you given any thought into using a temporary table for this?  You
> >could populate a table called SpamDB.rebuilding or something without
> having
> >to disable bayesian checks during the rebuild.  Then quickly turn off
> >bayesian, delete spamdb, rename spamdb.rebuilding.  Could do the same
> with
> >HMM.
>
> Do you really think this idea is NEW ?
>
> A major change to the DB engine code in the past, prevents to do it this
> way. Every thread (worker) uses a single database connection for all Tied
> tables.
> To rename a DB table, you need exclusive access to it, so all worker would
> have to untie all hashes, completely disconnect from the database and
> would have to wait until the rename of the tables is finished and then
> they would have to reconnect to the database and to Tie all tables to
> hashes again. Until this procedure has been finished, the workers can't do
> anything!
> The complete procedure "untie - disconnect - drop table - rename table -
> reconnect - tie" can take (depends on the configuration) more than one
> minute for the SMTP workers and several minutes for the
> Maintenance-Worker, if it is just doing any longer running task.
>
> Yes, what you suggest could be done! It will require several hundred or
> thousand lines of code to synchronize all workers to a single point in
> time and to do the stuff - but this makes NO sense to me!
> The way it works now (having only one feature running at a time for some
> minutes) is Pareto-optimal - and it works.
>
> There is a way to configure assp in an ISP-Cluster-Mode, where both spamdb
> and hmmdb are hold in unshared memory in every worker. But this requires
> expensive hardware (8 cores 16GB RAM minimum) for each cluster node.
>
>
> Thomas
>
>
>
>
>
>
>
>
> Von:    K Post <nntp.p...@gmail.com>
> An:     ASSP development mailing list <assp-test@lists.sourceforge.net>
> Datum:  29.04.2015 16:34
> Betreff:        Re: [Assp-test] MySQL vs BerkeleyDB
>
>
>
> FANTASTIC!
>
> The change to
> MySQL|*|NOOP|NOOP|$sql_sm="INSERT IGNORE INTO $mysqlTable VALUES
> "|$sql_sm="($k,$v,\'0\')"|$sql_sm=","|1000
>
> brought SpamDB generation down to under 3 minutes.
>
> Apr-29-15 09:41:41 Generating weighted Bayesian tuplets
> Apr-29-15 09:42:16 start populating Spamdb with 970,294 records - Bayesian
> check is now disabled!
> Apr-29-15 09:44:29 Finished populating Spamdb with 970,294 records -
> Bayesian check is now enabled!
> Apr-29-15 09:44:29 done - Generating weighted Bayesian tuplets
>
> This is great, though I have no idea what your change is actually doing.
>
> 3 minutes isn't bad at all though, at least not relative to where we were
> before.  In another thread, you (Thomas) posted this:
> Apr-24-15 04:28:29 start populating Spamdb with 387,637 records - Bayesian
> check is now disabled!
> Apr-24-15 04:29:11 Finished populating Spamdb with 387,637 records -
> Bayesian check is now enabled!
> Apr-24-15 04:29:11 done - Generating weighted Bayesian tuplets
> that's about 9k a second
>
> Mine comes to around 5700 a second.  Still slower than you're seeing.  Are
> you running MySQL Community 5.24?   Any changes to the defaults?
>
>
> And other questions:
> 1) Is MaxBytes still recommended to be around 3000 when the corupus is
> full?  Seems to me like my spam messages in corpus are way bigger than 3k
> on average.
> 2) I have RebuildThreadCycleType set to the default of 30.  Do you think I
> should lower this number?  Is it likely to improve rebuild time?
> 3) How many messages do you have in the corpus?  And how long is it taking
> you to process each folder?  I'm trying to gauge if I need to beg for
> better hardware (charity, not easy).  I'm seeing strange results
> (consistent but strange in my opinion)
>
> Apr-29-15 08:58:54 Processing... messages/errors-spam with 5,679 files
> Apr-29-15 09:10:59 Finished in 725 second(s)      0.12 secs per message
>
> Apr-29-15 09:10:59 Processing... messages/errors-notspam with 5,478 files
> Apr-29-15 09:31:41 Finished in 1,242 second(s)  0.22 secs per message
>
> Apr-29-15 09:31:41 Processing... messages/spam with 11,805 files
> Apr-29-15 09:40:07 Finished in 506 second(s)    0.04 secs per message (why
> so much faster than errors-spam)
>
> Apr-29-15 09:40:07 Processing... messages/notspam with 11,760 files
> Apr-29-15 09:41:41 Finished in 94 second(s)    0.008 secs per message -
> WOW-  why is this always SO fast relative to others.
>
> This is at 3k size.
>
>
> And last
> Have you given any thought into using a temporary table for this?  You
> could populate a table called SpamDB.rebuilding or something without
> having
> to disable bayesian checks during the rebuild.  Then quickly turn off
> bayesian, delete spamdb, rename spamdb.rebuilding.  Could do the same with
> HMM.
>
>
>
> On Wed, Apr 29, 2015 at 5:00 AM, Thomas Eckardt
> <thomas.ecka...@thockar.com>
> wrote:
>
> > Try the following:
> >
> > change the following line in the assp_db_import.cfg
> >
> > MySQL|*|NOOP|NOOP|$sql_sm="INSERT IGNORE INTO $mysqlTable VALUES
> > "|$sql_sm="($k,$v,\'$f\')"|$sql_sm=","|1000
> >
> > in to
> >
> > MySQL|*|NOOP|NOOP|$sql_sm="INSERT IGNORE INTO $mysqlTable VALUES
> > "|$sql_sm="($k,$v,\'0\')"|$sql_sm=","|1000
> >
> > it replaces $f with 0  - "($k,$v,\'$f\') ->  "($k,$v,\'0\')
> >
> > This issue seems to depend on anything, which is currently unknown to
> me.
> > I'm just looking for the reason, but this change in assp_db_import.cfg
> > will prevent skipping the Bulk-Import.
> > The next build will fix the issue, that the original assp_db_import.cfg
> > can be used.
> >
> > Thomas
> >
> >
> >
> >
> >
> > Von:    K Post <nntp.p...@gmail.com>
> > An:     ASSP development mailing list <assp-test@lists.sourceforge.net>
> > Datum:  28.04.2015 20:10
> > Betreff:        Re: [Assp-test] MySQL vs BerkeleyDB
> >
> >
> >
> > Sorry for the seemingly incessant emails...
> >
> > Error: You have an error in your SQL syntax; check the manual that
> > corresponds to your MySQL server version for the right syntax to use
> near
> > 'INSERT IGNORE INTO spamdb VALUES ,INSERT IGNORE INTO spamdb VALUES
> > ,INSERT
> > IGNOR' at line 1
> >
> > Don't recall having seen this before.  I'm now using assp_db_import.cfg
> > straight from cvs, no edits.  Do I need to edit this to use with mysql
> > too?  Or should I?  looks like the maximum records for insert in bulk is
> > 1000, should I change that?
> >
> >
> >
> > On Tue, Apr 28, 2015 at 10:15 AM, K Post <nntp.p...@gmail.com> wrote:
> >
> > > and note, looking periodically at the worker status window in the web
> > > admin, I see "chkdb - finished" for quite some time after the 40k
> files
> > > have been processed.  I think this is while spamdb is being generated.
> > >
> > > On Tue, Apr 28, 2015 at 9:29 AM, K Post <nntp.p...@gmail.com> wrote:
> > >
> > >> and why would the rebuild of hmm in berkeleydb take only seconds, but
> > the
> > >> spamdb in mysql (on same box) take 45 minutes?
> > >>
> > >> On Tue, Apr 28, 2015 at 9:28 AM, K Post <nntp.p...@gmail.com> wrote:
> > >>
> > >>> preventBulkImport is not checked.
> > >>>
> > >>> I've reinstalled the VM from scratch.  New OS installation, using
> the
> > >>> perl distribution 5.20 from
> > >>>
> >
> >
>
> http://sourceforge.net/projects/assp/files/ASSP%20V2%20multithreading/ASSP%20V2%20module%20installation/
>
> >
> > >>>
> > >>> Parsing the files, I'm talking about Apr-28-15 02:14:20
> Processing...
> > >>> messages/notspam with 14,759 files:
> > >>> I'm worried that just parsing through the 40k files is about 65%
> > slower
> > >>> than it is on the old production box using the same corpus (copied
> to
> > the
> > >>> dev machine) even though the old box is less than 1/2 the processing
> > power,
> > >>> has 40% slower disks, and 1/4 the RAM.  That very old installation
> > doesn't
> > >>> have HMM in the code, yes it's that old.  When rb_processfolder runs
> > in the
> > >>> latest version, is it doing more processing of each file because of
> > the HMM
> > >>> option?   I can't imagine why it would take so much longer on the
> new
> > >>> faster hardware.  Any temporary code modifications I can make to see
> > what's
> > >>> taking so long?
> > >>>
> > >>> Is there a spot in code where I could also modify bulk import of
> > spamdb
> > >>> during the rebuild?  I'd like to see if I can modify that as a test
> to
> > >>> write the import script as a file, ultimately to test how long it
> > takes to
> > >>> import. Or any suggestions on timing this would be great.
> > >>>
> > >>> I'm really struggling here, thanks for the help.
> > >>>
> > >>>
> > >>> On Tue, Apr 28, 2015 at 4:19 AM, Thomas Eckardt <
> > >>> thomas.ecka...@thockar.com> wrote:
> > >>>
> > >>>> populating the SpamDB and HMMdbis a  "DB Import". Check that
> > >>>> 'preventBulkImport' is disabled!
> > >>>>
> > >>>> Thomas
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> Von:    K Post <nntp.p...@gmail.com>
> > >>>> An:     ASSP development mailing list
> > <assp-test@lists.sourceforge.net>
> > >>>> Datum:  27.04.2015 20:32
> > >>>> Betreff:        [Assp-test] MySQL vs BerkeleyDB
> > >>>>
> > >>>>
> > >>>>
> > >>>> Hi all-
> > >>>>
> > >>>> I'm having a rough go getting the rebuild process to quickly
> rebuild
> > >>>> spamdb.  The HMM db, which I have using BerkeleyDB rebuilds
> > wonderfully,
> > >>>> in
> > >>>> under a minute.  However, spamdb, which uses MySQL, is taking over
> 45
> > >>>> minutes.  That's no good.
> > >>>>
> > >>>> The real question is if there is a downside for using BerkeleyDB
> for
> > >>>> everything?
> > >>>>
> > >>>> In reality, I'd like to figure out why my installation is taking so
> > slow
> > >>>> with MySQL (and I've got another stalled out thread going on that).
> I
> > >>>> worry about the lack of management tools with BerkeleyDB.  I'd be
> > >>>> uncomfortable with the whitelist being in Berkeley.
> > >>>>
> > >>>>
> > >>>> More info:
> > >>>>
> > >>>> ASSP and MySQL are running on the same Windows 2012 hypver-v
> virtual
> > >>>> machine.  16gb ram.  4gb ram disk for c:/assp/tmpDB (using the
> imdisk
> > >>>> driver),  The vm seems to be running quickly for all other tasks.
> > >>>>
> > >>>> I've got a corpus of around 15k spam, 15k not spam, and 5k errors
> for
> > >>>> each
> > >>>> of error-spam and error-notspam (so about 40k total).  It takes
> about
> > 45
> > >>>> minutes to go through all of these messages and I'm okay with that
> > >>>>
> > >>>> MySQL is using the setting suggested here:
> > >>>> http://sourceforge.net/p/assp/mailman/message/29893302/ by Thomas,
> > >>>> though net_buffer_length
> > >>>> is limited to 1M according to the documentation.
> > >>>>
> > >>>> Apr-27-15 13:23:47 start populating Spamdb with 1,140,905 records -
> > >>>> Bayesian check is now disabled!
> > >>>> Apr-27-15 14:07:09 Finished populating Spamdb with 1,140,905
> records
> > -
> > >>>> Bayesian check is now enabled!
> > >>>>
> > >>>>
> > >>>> I'd really like to stick with MySQL for spamdb and the other
> > databases,
> > >>>> but
> > >>>> berkeleydb as recommended for HMM.  I just can't see doing that if
> > the
> > >>>> rebuild of spamdb will be so slow.
> > >>>>
> > >>>> What kind of speeds is everyone else seeing for the spamdb rebuild
> > >>>> portion
> > >>>> of the rebuild?
> > >>>>
> > >>>> I'd love some suggestions on speeding up MySQL or anything else.
> > Thank
> > >>>> you
> > >>>>
> > >>>> Ken
> > >>>>
> > >>>>
> >
> >
>
> ------------------------------------------------------------------------------
> > >>>> One dashboard for servers and applications across
> > Physical-Virtual-Cloud
> > >>>> Widest out-of-the-box monitoring support with 50+ applications
> > >>>> Performance metrics, stats and reports that give you Actionable
> > Insights
> > >>>> Deep dive visibility with transaction tracing using APM Insight.
> > >>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> > >>>> _______________________________________________
> > >>>> Assp-test mailing list
> > >>>> Assp-test@lists.sourceforge.net
> > >>>> https://lists.sourceforge.net/lists/listinfo/assp-test
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> DISCLAIMER:
> > >>>> *******************************************************
> > >>>> This email and any files transmitted with it may be confidential,
> > >>>> legally
> > >>>> privileged and protected in law and are intended solely for the use
> > of
> > >>>> the
> > >>>>
> > >>>> individual to whom it is addressed.
> > >>>> This email was multiple times scanned for viruses. There should be
> no
> > >>>> known virus in this email!
> > >>>> *******************************************************
> > >>>>
> > >>>>
> > >>>>
> >
> >
>
> ------------------------------------------------------------------------------
> > >>>> One dashboard for servers and applications across
> > Physical-Virtual-Cloud
> > >>>> Widest out-of-the-box monitoring support with 50+ applications
> > >>>> Performance metrics, stats and reports that give you Actionable
> > Insights
> > >>>> Deep dive visibility with transaction tracing using APM Insight.
> > >>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> > >>>> _______________________________________________
> > >>>> Assp-test mailing list
> > >>>> Assp-test@lists.sourceforge.net
> > >>>> https://lists.sourceforge.net/lists/listinfo/assp-test
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
> >
>
> ------------------------------------------------------------------------------
> > One dashboard for servers and applications across Physical-Virtual-Cloud
> > Widest out-of-the-box monitoring support with 50+ applications
> > Performance metrics, stats and reports that give you Actionable Insights
> > Deep dive visibility with transaction tracing using APM Insight.
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> > _______________________________________________
> > Assp-test mailing list
> > Assp-test@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/assp-test
> >
> >
> >
> >
> >
> >
> > DISCLAIMER:
> > *******************************************************
> > This email and any files transmitted with it may be confidential,
> legally
> > privileged and protected in law and are intended solely for the use of
> the
> >
> > individual to whom it is addressed.
> > This email was multiple times scanned for viruses. There should be no
> > known virus in this email!
> > *******************************************************
> >
> >
> >
>
> ------------------------------------------------------------------------------
> > One dashboard for servers and applications across Physical-Virtual-Cloud
> > Widest out-of-the-box monitoring support with 50+ applications
> > Performance metrics, stats and reports that give you Actionable Insights
> > Deep dive visibility with transaction tracing using APM Insight.
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> > _______________________________________________
> > Assp-test mailing list
> > Assp-test@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/assp-test
> >
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
>
>
>
>
>
> DISCLAIMER:
> *******************************************************
> This email and any files transmitted with it may be confidential, legally
> privileged and protected in law and are intended solely for the use of the
>
> individual to whom it is addressed.
> This email was multiple times scanned for viruses. There should be no
> known virus in this email!
> *******************************************************
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to