Based on following the thread I recompile bacula with the following options: ./configure --enable-smartalloc --with-mysql --with-tcp-wrappers=/usr/lib/libwrap.so --enable-static-fd --enable-static-dir --enable-static-sd --disable-batch-insert And my 18+ hour backup ran in 9 hours 22 mins 27 secs last night. It seems to be back to the pre 3.0 time.
As for my tables I have an Admin job that optimizes my bacula tables within mysql. Job { Name = optimize_table_JobMedia JobDefs = "DefaultJob" Type = Admin Client = bacula-local-fd RunBeforeJob = "/etc/bacula/optimize_table_JobMedia.sh" Priority = 40 Cancel Queued Duplicates=yes Maximum Concurrent Jobs = 1 Run Before Job = "/etc/bacula/check-duplicate-queued-job.sh '%n' '%i'" } Please let me know what else I can do to provide useful data for the improvement of bacula (Worlds best backup software!). Thanks -Jason On Sun, 2009-06-21 at 08:11 +0200, Marc Cousin wrote: > First thing to know is which of the queries takes time (there are the > multiple > inserts into the batch table, then insert missing filenames/path, and the > final insert). Or is it slow with all the queries ? Which storage engine ? > > There are two possibilities : > > - either you have a bad plan for one of the queries (but I don't see how, > mysql having a very simple optimizer). > - or you have a tuning problem with mysql (there I can't really help you : > not > enough memory ? fragmented indexes ? I don't really know how mysql works) > > Kern, I have not been following closely the 3.0 development : do the xattr go > into the lstat field ? > > On Sunday 21 June 2009 01:52:34 Jason A. Kates wrote: > > I have a backup that runs for 3 hours then spends another 15 hours in > > status "Dir inserting Attributes". > > > > The backup has 9,442,578 files and 239,718,461,832 bytes. It was much > > faster with bacula before and I thought that that the time change was > > due to additional data with xattrsupport that didn't exist before 3.0. > > The job a spool job with spooled attributes too. > > > > > > > > > > We are using mysql-server-5.0.45-7.el5. I would be glad to collect and > > share data. Please let me know what you sould like to me set or do to > > get the appropriate data. > > > > Thanks -Jason > > > > On Sat, 2009-06-20 at 17:48 +0200, Marc Cousin wrote: > > > There may be something. But first we have to know why mysql behaves that > > > slowly for Tom Summers. The problem being that I don't have a big mysql > > > database to test on ... So as long as we don't have the query plan or an > > > explanation ... > > > > > > On Saturday 20 June 2009 14:36:44 Kern Sibbald wrote: > > > > Hello Marc, > > > > > > > > Thanks for looking at this and for your responses. If I understand you > > > > correctly, there is nothing to be done, which does not surprise me :-) > > > > > > > > Best regards, > > > > > > > > Kern > > > > > > > > On Saturday 20 June 2009 13:16:44 Marc Cousin wrote: > > > > > > Subject: Re: [Bacula-users] Performance with MySQL queries since > > > > > > 3.0.0 (Dir inserting attributes hang) > > > > > > Date: Saturday 20 June 2009 > > > > > > From: Kern Sibbald <k...@sibbald.com> > > > > > > To: Tom Sommer <m...@tomsommer.dk> > > > > > > > > > > > > On Saturday 20 June 2009 08:51:53 Tom Sommer wrote: > > > > > > > Tom Sommer wrote: > > > > > > > > Mike Holden wrote: > > > > > > > >> Jari Fredriksson wrote: > > > > > > > >>>> INSERT INTO Filename( Name ) > > > > > > > >>>> SELECT a.Name > > > > > > > >>>> FROM ( > > > > > > > >>>> > > > > > > > >>>> SELECT DISTINCT Name > > > > > > > >>>> FROM batch > > > > > > > >>>> ) AS a > > > > > > > >>>> WHERE NOT > > > > > > > >>>> EXISTS ( > > > > > > > >>>> > > > > > > > >>>> SELECT Name > > > > > > > >>>> FROM Filename AS f > > > > > > > >>>> WHERE f.Name = a.Name > > > > > > > >>>> ) > > > > > > > >>> > > > > > > > >>> That looks silly. > > > > > > > >>> > > > > > > > >>> I would write it shorter as > > > > > > > >>> > > > > > > > >>> INSERT INTO Filename(Name) > > > > > > > >>> SELECT DISTINCT Name > > > > > > > >>> FROM batch AS a > > > > > > > >>> WHERE NOT EXISTS > > > > > > > >>> ( > > > > > > > >>> SELECT Name > > > > > > > >>> FROM Filename AS f > > > > > > > >>> WHERE f.Name = a.Name > > > > > > > >>> ) > > > > > > > >> > > > > > > > >> You may also want to consider using a JOIN rather than a > > > > > > > >> subquery with a NOT EXISTS, something like (untested and > > > > > > > >> unvalidated!): > > > > > > > >> > > > > > > > >> INSERT INTO filename(name) > > > > > > > >> SELECT DISTINCT name > > > > > > > >> FROM batch AS a > > > > > > > >> LEFT JOIN filename AS f USING (name) > > > > > > > >> WHERE f.name IS NULL > > > > > > > >> > > > > > > > >> I know from experience as an Oracle DBA (my day job) that this > > > > > > > >> can often produce far more efficient results. > > > > > > > >> > > > > > > > >> Of course, all options need testing for both speed of > > > > > > > >> execution and resource usage, bearing in mind that data varies > > > > > > > >> from one installation to another, and one size may not fit > > > > > > > >> all! > > > > > > > > > > > > > > > > Good suggestions, sounds like there might be an overall > > > > > > > > performance problem with the current query for batch-inserts > > > > > > > > with lots of data. I'm a bit unsure if I dare test these > > > > > > > > queries on my current installation. > > > > > > > > > > > > > > > > I'll CC Kern on the thread, perhaps he has some insights. > > > > > > > > > > > > > > So this morning I had to kill the above query because it's been > > > > > > > running for 24+ hours, preventing the new daily jobs from > > > > > > > running. I think I'm going to try and disable batch-inserts, the > > > > > > > current situation is simply not good enough, it's become a major > > > > > > > headache to run backups suddenly. When I hit the end of this > > > > > > > month and ALL servers have to run FULL backups, I'm gonna be in a > > > > > > > world of trouble I think - I just don't understand what has > > > > > > > changed, because it's all been running great up until now. > > > > > > > > > > > > > > // Tom > > > > > > > > > > > > Hello, > > > > > > > > > > > > We certainly can use help optimizing SQL since we are not DBAs, and > > > > > > we will look into the SQL optimization suggestions given above, > > > > > > keeping in mind that there are often rather radical differences in > > > > > > timing of particular SQL queries depending on the database engine > > > > > > used. > > > > > > > > > > > > To the best of my knowledge nothing has changed in terms of the > > > > > > Batch insert queries since when it was implemented, and it is > > > > > > *very* unlikely (though I haven't checked the code) that something > > > > > > changed from 2.4.4 to 3.0.x. > > > > > > > > > > > > More likely, your workload or MySQL has changed in some way -- e.g. > > > > > > more Jobs, more machines backed up, Director machine with less > > > > > > memory or other jobs that use memory, a new version of MySQL, ... > > > > > > > > > > > > - I would suggest that you ensure that your database has all the > > > > > > recommended indexes (see the make_mysql_tables file), and that you > > > > > > are running with the large memory /etc/my.cnf file. > > > > > > > > > > > > - Another thing to do is to compact your database. One way to do > > > > > > it is to write it to an ASCII file and then re-insert it. > > > > > > > > > > > > - If you are running certain programs that create and delete lots > > > > > > of temporary files with different names, you Filename table may > > > > > > need cleaning. > > > > > > > > > > > > - I would strongly recommend not starting *lots* of Full backups at > > > > > > the same time or on the same day. By lots, I mean more than say 10 > > > > > > or 20 (depends on the size of your system). It is generally far > > > > > > better to stage 1/4 of the backup every week for a full backup so > > > > > > that the peak workload is spread out over the month. > > > > > > > > > > > > - If the bottleneck is in MySQL, you might consider moving it to > > > > > > another machine that has more memory and faster disks. > > > > > > > > > > > > - If you really have a huge number of backups (say 50 or more) that > > > > > > all run at the same time, it might be advisable to consider using > > > > > > PostgreSQL, but in that case, you will probably need an onsite DBA > > > > > > to properly tune and maintain it. > > > > > > > > > > > > - Regardless of what hardware you have, there are certain > > > > > > limitations on how many simultaneous jobs you can run (this kicks > > > > > > in on many systems around 50). Once a certain number is exceeded, > > > > > > the total throughput can rather radically decrease so careful > > > > > > monitoring is necessary. Bweb can help a lot in these situations. > > > > > > > > > > > > - If you are really "in a world of trouble" and it is a performance > > > > > > issue, there is not much we (the Bacula project) can do for you > > > > > > other than the above tips. However, ($$) Bacula Systems has tools > > > > > > that can help more precisely identify bottlenecks and help balance > > > > > > loads. > > > > > > > > > > > > Regards, > > > > > > > > > > > > Kern > > > > > > > > > > > > ------------------------------------------------------- > > > > > > > > > > Sorry Kern, I wasn't monitoring this thread (mysql in the title :) ) > > > > > > > > > > I wrote these queries, so maybe I should explain (not why I'm right, > > > > > I'm not sure of that, but why we've done it this way). > > > > > > > > > > A few forewords : > > > > > - batch insert was created at the begining for postgresql. I don't > > > > > have any experience with mysql, except that knowing at the moment we > > > > > programmed batch insert it didn't really do anything but nested > > > > > loops... - The main reason for batch inserts with postgresql was to > > > > > use COPY instead of INSERT, which is much faster (we also had good > > > > > results with stored procedures, but batch insert was even faster) > > > > > - our current file table is nearly 200GB (1 billion rows), with a > > > > > 250GB database, and we insert around 50 million records a day (rough > > > > > estimate) > > > > > > > > > > * First question I've seen in the thread: > > > > > > > > > > Why > > > > > > > > > > INSERT INTO Filename( Name ) > > > > > SELECT a.Name > > > > > FROM ( > > > > > SELECT DISTINCT Name > > > > > FROM batch > > > > > ) AS a > > > > > WHERE NOT EXISTS ( > > > > > SELECT Name > > > > > FROM Filename AS f > > > > > WHERE f.Name = a.Name > > > > > ) > > > > > > > > > > instead of the much simpler > > > > > > > > > > INSERT INTO Filename(Name) > > > > > SELECT DISTINCT Name > > > > > FROM batch AS a > > > > > WHERE NOT EXISTS > > > > > ( > > > > > SELECT Name > > > > > FROM Filename AS f > > > > > WHERE f.Name = a.Name > > > > > ) > > > > > > > > > > > > > > > => Because the first one removes the duplicate rows from batch table > > > > > before checking they exist in the filename table, while the second > > > > > one checks after. Performance was better in benchmarks with the first > > > > > one (with PostgreSQL, with our data). > > > > > > > > > > > > > > > * Second question : > > > > > Why not : > > > > > > > > > > INSERT INTO filename(name) > > > > > SELECT DISTINCT name > > > > > FROM batch AS a > > > > > LEFT JOIN filename AS f USING (name) > > > > > WHERE f.name IS NULL > > > > > > > > > > In fact, if the engine choses the right execution path, the execution > > > > > plan will be the roughly the same (nested loop) as > > > > > > > > > > INSERT INTO Filename(Name) > > > > > SELECT DISTINCT Name > > > > > FROM batch AS a > > > > > WHERE NOT EXISTS > > > > > ( > > > > > SELECT Name > > > > > FROM Filename AS f > > > > > WHERE f.Name = a.Name > > > > > ) > > > > > > > > > > Except if batch is close in size to filename (for instance on my > > > > > bacula database, it would be a 50 million files backup). Anyway, If i > > > > > recall correctly about mysql (and it hasn't changed recently), mysql > > > > > will do a nested loop for both, as it doesn't know how to merge or > > > > > hash join ... > > > > > > > > > > > > > > > But I agree that this second syntax theorically gives more lattitude > > > > > to the engine. Except that recent oracle and postgresql versions will > > > > > anyway be able to convert between the exists/not exists and the join > > > > > (or IN) in this case, as they are semantically equivalent. > > > > > > > > > > > > > > > Now, I don't know how mysql would react to all these queries. An > > > > > explain plan would be great to understand... but I would have thought > > > > > that the NOT EXIST query would have been great for it's optimizer, as > > > > > it tells mysql exactly what to do :) > > > > > > > > > > > > > > > A last note, for postgreSQL users on this mailing list : > > > > > You can try to get better performance with batch inserts by lowering > > > > > seq_page_cost and random_page_cost, so that the engine favours nested > > > > > loop instead of hash join for the INSERT INTO File query : most of > > > > > the time, most of filename and path will be in the database or > > > > > filesystem cache, and the plan will be much faster (and save you big > > > > > sorts) > > > > > > > > > > > > > > > Please tell me if you need more information. > > > > > > > > > > --------------------------------------------------------------------- > > > > >---- -- --- Are you an open source citizen? Join us for the Open > > > > > Source Bridge conference! Portland, OR, June 17-19. Two days of > > > > > sessions, one day of unconference: $250. Need another reason to go? > > > > > 24-hour hacker lounge. Register today! > > > > > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourceb > > > > >ridg e. org _______________________________________________ > > > > > Bacula-devel mailing list > > > > > Bacula-devel@lists.sourceforge.net > > > > > https://lists.sourceforge.net/lists/listinfo/bacula-devel > > > > > > ------------------------------------------------------------------------- > > >----- Are you an open source citizen? Join us for the Open Source Bridge > > > conference! Portland, OR, June 17-19. Two days of sessions, one day of > > > unconference: $250. Need another reason to go? 24-hour hacker lounge. > > > Register today! > > > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridg > > >e.org _______________________________________________ > > > Bacula-devel mailing list > > > Bacula-devel@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/bacula-devel > > -- ---------------------------------------------------------------------------- Jason A. Kates (ja...@kates.org) Fax: 208-975-1514 Phone: 212-400-1670 x2 ============================================================================ ------------------------------------------------------------------------------ Are you an open source citizen? Join us for the Open Source Bridge conference! Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250. Need another reason to go? 24-hour hacker lounge. Register today! http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel