Re: [Bacula-devel] Fwd: Re: [Bacula-users] Performance with MySQL queries since 3.0.0 (Dir inserting attributes hang)

Marc Cousin Sat, 20 Jun 2009 08:50:50 -0700

There may be something. But first we have to know why mysql behaves that 
slowly for Tom Summers. The problem being that I don't have a big mysql 
database to test on ... So as long as we don't have the query plan or an 
explanation ...


On Saturday 20 June 2009 14:36:44 Kern Sibbald wrote:
> Hello Marc,
>
> Thanks for looking at this and for your responses.  If I understand you
> correctly, there is nothing to be done, which does not surprise me :-)
>
> Best regards,
>
> Kern
>
> On Saturday 20 June 2009 13:16:44 Marc Cousin wrote:
> > > Subject: Re: [Bacula-users] Performance with MySQL queries since 3.0.0
> > > (Dir inserting attributes hang)
> > > Date: Saturday 20 June 2009
> > > From: Kern Sibbald <[email protected]>
> > > To: Tom Sommer <[email protected]>
> > >
> > > On Saturday 20 June 2009 08:51:53 Tom Sommer wrote:
> > > > Tom Sommer wrote:
> > > > > Mike Holden wrote:
> > > > >> Jari Fredriksson wrote:
> > > > >>>> INSERT INTO Filename( Name )
> > > > >>>> SELECT a.Name
> > > > >>>> FROM (
> > > > >>>>
> > > > >>>> SELECT DISTINCT Name
> > > > >>>> FROM batch
> > > > >>>> ) AS a
> > > > >>>> WHERE NOT
> > > > >>>> EXISTS (
> > > > >>>>
> > > > >>>> SELECT Name
> > > > >>>> FROM Filename AS f
> > > > >>>> WHERE f.Name = a.Name
> > > > >>>> )
> > > > >>>
> > > > >>> That looks silly.
> > > > >>>
> > > > >>> I would write it shorter as
> > > > >>>
> > > > >>> INSERT INTO Filename(Name)
> > > > >>> SELECT DISTINCT Name
> > > > >>> FROM batch AS a
> > > > >>> WHERE NOT EXISTS
> > > > >>> (
> > > > >>>     SELECT Name
> > > > >>>     FROM Filename AS f
> > > > >>>     WHERE f.Name = a.Name
> > > > >>> )
> > > > >>
> > > > >> You may also want to consider using a JOIN rather than a subquery
> > > > >> with a NOT EXISTS, something like (untested and unvalidated!):
> > > > >>
> > > > >> INSERT INTO filename(name)
> > > > >> SELECT DISTINCT name
> > > > >> FROM batch AS a
> > > > >> LEFT JOIN filename AS f USING (name)
> > > > >> WHERE f.name IS NULL
> > > > >>
> > > > >> I know from experience as an Oracle DBA (my day job) that this can
> > > > >> often produce far more efficient results.
> > > > >>
> > > > >> Of course, all options need testing for both speed of execution
> > > > >> and resource usage, bearing in mind that data varies from one
> > > > >> installation to another, and one size may not fit all!
> > > > >
> > > > > Good suggestions, sounds like there might be an overall performance
> > > > > problem with the current query for batch-inserts with lots of data.
> > > > > I'm a bit unsure if I dare test these queries on my current
> > > > > installation.
> > > > >
> > > > > I'll CC Kern on the thread, perhaps he has some insights.
> > > >
> > > > So this morning I had to kill the above query because it's been
> > > > running for 24+ hours, preventing the new daily jobs from running.
> > > > I think I'm going to try and disable batch-inserts, the current
> > > > situation is simply not good enough, it's become a major headache to
> > > > run backups suddenly. When I hit the end of this month and ALL
> > > > servers have to run FULL backups, I'm gonna be in a world of trouble
> > > > I think - I just don't understand what has changed, because it's all
> > > > been running great up until now.
> > > >
> > > > // Tom
> > >
> > > Hello,
> > >
> > > We certainly can use help optimizing SQL since we are not DBAs, and we
> > > will look into the SQL optimization suggestions given above, keeping in
> > > mind that there are often rather radical differences in timing of
> > > particular SQL queries depending on the database engine used.
> > >
> > > To the best of my knowledge nothing has changed in terms of the Batch
> > > insert queries since when it was implemented, and it is *very* unlikely
> > > (though I haven't checked the code) that something changed from 2.4.4
> > > to 3.0.x.
> > >
> > > More likely, your workload or MySQL has changed in some way -- e.g.
> > > more Jobs, more machines backed up, Director machine with less memory
> > > or other jobs that use memory, a new version of MySQL, ...
> > >
> > > - I would suggest that you ensure that your database has all the
> > > recommended indexes (see the make_mysql_tables file), and that you are
> > > running with the large memory /etc/my.cnf file.
> > >
> > > - Another thing to do is to compact your database.  One way to do it is
> > > to write it to an ASCII file and then re-insert it.
> > >
> > > - If you are running certain programs that create and delete lots of
> > > temporary files with different names, you Filename table may need
> > > cleaning.
> > >
> > > - I would strongly recommend not starting *lots* of Full backups at the
> > > same time or on the same day.  By lots, I mean more than say 10 or 20
> > > (depends on the size of your system).  It is generally far better to
> > > stage 1/4 of the backup every week for a full backup so that the peak
> > > workload is spread out over the month.
> > >
> > > - If the bottleneck is in MySQL, you might consider moving it to
> > > another machine that has more memory and faster disks.
> > >
> > > - If you really have a huge number of backups (say 50 or more) that all
> > > run at the same time, it might be advisable to consider using
> > > PostgreSQL, but in that case, you will probably need an onsite DBA to
> > > properly tune and maintain it.
> > >
> > > - Regardless of what hardware you have, there are certain limitations
> > > on how many simultaneous jobs you can run (this kicks in on many
> > > systems around 50). Once a certain number is exceeded, the total
> > > throughput can rather radically decrease so careful monitoring is
> > > necessary.  Bweb can help a lot in these situations.
> > >
> > > - If you are really "in a world of trouble" and it is a performance
> > > issue, there is not much we (the Bacula project) can do for you other
> > > than the above tips.  However, ($$) Bacula Systems has tools that can
> > > help more precisely identify bottlenecks and help balance loads.
> > >
> > > Regards,
> > >
> > > Kern
> > >
> > > -------------------------------------------------------
> >
> > Sorry Kern, I wasn't monitoring this thread (mysql in the title  :) )
> >
> > I wrote these queries, so maybe I should explain (not why I'm right, I'm
> > not sure of that, but why we've done it this way).
> >
> > A few forewords :
> > - batch insert was created at the begining for postgresql. I don't have
> > any experience with mysql, except that knowing at the moment we
> > programmed batch insert it didn't really do anything but nested loops...
> > - The main reason for batch inserts with postgresql was to use COPY
> > instead of INSERT, which is much faster (we also had good results with
> > stored procedures, but batch insert was even faster)
> > - our current file table is nearly 200GB (1 billion rows), with a 250GB
> > database, and we insert around 50 million records a day (rough estimate)
> >
> > * First question I've seen in the thread:
> >
> > Why
> >
> > INSERT INTO Filename( Name )
> > SELECT a.Name
> > FROM (
> >     SELECT DISTINCT Name
> >     FROM batch
> > ) AS a
> > WHERE NOT EXISTS (
> >     SELECT Name
> >     FROM Filename AS f
> >     WHERE f.Name = a.Name
> > )
> >
> > instead of the much simpler
> >
> > INSERT INTO Filename(Name)
> > SELECT DISTINCT Name
> > FROM batch AS a
> > WHERE NOT EXISTS
> > (
> >     SELECT Name
> >     FROM Filename AS f
> >     WHERE f.Name = a.Name
> > )
> >
> >
> > => Because the first one removes the duplicate rows from batch table
> > before checking they exist in the filename table, while the second one
> > checks after. Performance was better in benchmarks with the first one
> > (with PostgreSQL, with our data).
> >
> >
> > * Second question :
> > Why not :
> >
> > INSERT INTO filename(name)
> > SELECT DISTINCT name
> > FROM batch AS a
> > LEFT JOIN filename AS f USING (name)
> > WHERE f.name IS NULL
> >
> > In fact, if the engine choses the right execution path, the execution
> > plan will be the roughly the same (nested loop) as
> >
> > INSERT INTO Filename(Name)
> > SELECT DISTINCT Name
> > FROM batch AS a
> > WHERE NOT EXISTS
> > (
> > SELECT Name
> > FROM Filename AS f
> > WHERE f.Name = a.Name
> > )
> >
> > Except if batch is close in size to filename (for instance on my bacula
> > database, it would be a 50 million files backup). Anyway, If i recall
> > correctly about mysql (and it hasn't changed recently), mysql will do a
> > nested loop for both, as it doesn't know how to merge or hash join ...
> >
> >
> > But I agree that this second syntax theorically gives more lattitude to
> > the engine. Except that recent oracle and postgresql versions will anyway
> > be able to convert between the exists/not exists and the join (or IN) in
> > this case, as they are semantically equivalent.
> >
> >
> > Now, I don't know how mysql would react to all these queries. An explain
> > plan would be great to understand... but I would have thought that the
> > NOT EXIST query would have been great for it's optimizer, as it tells
> > mysql exactly what to do :)
> >
> >
> > A last note, for postgreSQL users on this mailing list :
> > You can try to get better performance with batch inserts by lowering
> > seq_page_cost and random_page_cost, so that the engine favours nested
> > loop instead of hash join for the INSERT INTO File query : most of the
> > time, most of filename and path will be in the database or filesystem
> > cache, and the plan will be much faster (and save you big sorts)
> >
> >
> > Please tell me if you need more information.
> >
> > -------------------------------------------------------------------------
> >-- --- Are you an open source citizen? Join us for the Open Source Bridge
> > conference! Portland, OR, June 17-19. Two days of sessions, one day of
> > unconference: $250. Need another reason to go? 24-hour hacker lounge.
> > Register today!
> > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridg
> >e. org _______________________________________________
> > Bacula-devel mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/bacula-devel



------------------------------------------------------------------------------
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Fwd: Re: [Bacula-users] Performance with MySQL queries since 3.0.0 (Dir inserting attributes hang)

Reply via email to