In my /etc/my.cnf I have this set: tmpdir = /tmp /tmp is a memory file system to avoid I/O contention.
When looking you can see that temp tables are create in /tmp thus /var/tmp should be out of the loop. The db resides in /export/mysql which is part of a 6 disk 15K rpm hardware raid 10 disk set. df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/cciss/c0d0p1 ext3 996M 489M 456M 52% / /dev/mapper/Volume00-vartest ext3 2.0G 368M 1.5G 20% /var /dev/mapper/Volume00-export ext3 186G 47G 130G 27% /export /dev/mapper/Volume00-usr ext3 3.9G 1.8G 2.0G 47% /usr none tmpfs 8.0G 8.0K 8.0G 1% /tmp /dev/cciss/c1d0p1 ext3 943G 316G 579G 36% /var/spool/bacula We are running Redhat ES 5 with Kernel 2.6.18-128.el5. If you would like to test a mount option I would be glad to give it a test. -Jason On Mon, 2009-06-22 at 07:38 +0200, Bruno Friedmann wrote: > Hi Jason, can you tell us if the place mysql works typically /var/lib/mysql > are on the same partition of /tmp or /var/tmp ? > > I've found (in another project) with mysql myisam table you can kill disk io > if the two are on same disk. It's worst when you insert lot of data from the > same. > > In a perfect world after tunning carefully the /etc/my.cnf ( huge isn't > suffisant in many big case ) > you should have one full io for mysql temp files, one full io for indexes one > full io for tables > AND one full io for import data ( completely separate /tmp for example ). > > In many installation, /tmp and /var are on the same device if you add the > fact of lvm or raid 1 configuration > you get some io trouble. > > I wouldn't explain all fine tuning of mysql here, there's a lot and each off > them should be review & tested. > > Next there's also ext3 tuning for those who use it as many distro are very > conservative. > (Newer kernel >2.6.30 would reduce the impact) > atime,diratime,data=ordered etc ... could change life :-) > > > First time bacula came with batch-insert enabled, I've test it, and have > really poor performances. > So I disable it. ( lvm on top of soft raid 1 ) my mistake to not mention it > on user/devel list. > > In some future here we decide to migrate all possible application to > postgresql. > > For debugging and making some sort of test case, it would be nice to have a > "joke" dump database > containing lot of dir & files. > so we can test large queries .... > > Perhaps someone with very large records could do it using the md5() on > Filename and directory ? > > > > Jason A. Kates wrote: > > Based on following the thread I recompile bacula with the following > > options: > > ./configure --enable-smartalloc --with-mysql > > --with-tcp-wrappers=/usr/lib/libwrap.so --enable-static-fd > > --enable-static-dir --enable-static-sd --disable-batch-insert > > And my 18+ hour backup ran in 9 hours 22 mins 27 secs last night. It > > seems to be back to the pre 3.0 time. > > > > As for my tables I have an Admin job that optimizes my bacula tables > > within mysql. > > Job { > > Name = optimize_table_JobMedia > > JobDefs = "DefaultJob" > > Type = Admin > > Client = bacula-local-fd > > RunBeforeJob = "/etc/bacula/optimize_table_JobMedia.sh" > > Priority = 40 > > Cancel Queued Duplicates=yes > > Maximum Concurrent Jobs = 1 > > Run Before Job = "/etc/bacula/check-duplicate-queued-job.sh '%n' > > '%i'" > > } > > > > Please let me know what else I can do to provide useful data for the > > improvement of bacula (Worlds best backup software!). > > Thanks -Jason > > > > On Sun, 2009-06-21 at 08:11 +0200, Marc Cousin wrote: > >> First thing to know is which of the queries takes time (there are the > >> multiple > >> inserts into the batch table, then insert missing filenames/path, and the > >> final insert). Or is it slow with all the queries ? Which storage engine ? > >> > >> There are two possibilities : > >> > >> - either you have a bad plan for one of the queries (but I don't see how, > >> mysql having a very simple optimizer). > >> - or you have a tuning problem with mysql (there I can't really help you : > >> not > >> enough memory ? fragmented indexes ? I don't really know how mysql works) > >> > >> Kern, I have not been following closely the 3.0 development : do the xattr > >> go > >> into the lstat field ? > >> > >> On Sunday 21 June 2009 01:52:34 Jason A. Kates wrote: > >>> I have a backup that runs for 3 hours then spends another 15 hours in > >>> status "Dir inserting Attributes". > >>> > >>> The backup has 9,442,578 files and 239,718,461,832 bytes. It was much > >>> faster with bacula before and I thought that that the time change was > >>> due to additional data with xattrsupport that didn't exist before 3.0. > >>> The job a spool job with spooled attributes too. > >>> > >>> > >>> > >>> > >>> We are using mysql-server-5.0.45-7.el5. I would be glad to collect and > >>> share data. Please let me know what you sould like to me set or do to > >>> get the appropriate data. > >>> > >>> Thanks -Jason > >>> > >>> On Sat, 2009-06-20 at 17:48 +0200, Marc Cousin wrote: > >>>> There may be something. But first we have to know why mysql behaves that > >>>> slowly for Tom Summers. The problem being that I don't have a big mysql > >>>> database to test on ... So as long as we don't have the query plan or an > >>>> explanation ... > >>>> > >>>> On Saturday 20 June 2009 14:36:44 Kern Sibbald wrote: > >>>>> Hello Marc, > >>>>> > >>>>> Thanks for looking at this and for your responses. If I understand you > >>>>> correctly, there is nothing to be done, which does not surprise me :-) > >>>>> > >>>>> Best regards, > >>>>> > >>>>> Kern > >>>>> > >>>>> On Saturday 20 June 2009 13:16:44 Marc Cousin wrote: > >>>>>>> Subject: Re: [Bacula-users] Performance with MySQL queries since > >>>>>>> 3.0.0 (Dir inserting attributes hang) > >>>>>>> Date: Saturday 20 June 2009 > >>>>>>> From: Kern Sibbald <k...@sibbald.com> > >>>>>>> To: Tom Sommer <m...@tomsommer.dk> > >>>>>>> > >>>>>>> On Saturday 20 June 2009 08:51:53 Tom Sommer wrote: > >>>>>>>> Tom Sommer wrote: > >>>>>>>>> Mike Holden wrote: > >>>>>>>>>> Jari Fredriksson wrote: > >>>>>>>>>>>> INSERT INTO Filename( Name ) > >>>>>>>>>>>> SELECT a.Name > >>>>>>>>>>>> FROM ( > >>>>>>>>>>>> > >>>>>>>>>>>> SELECT DISTINCT Name > >>>>>>>>>>>> FROM batch > >>>>>>>>>>>> ) AS a > >>>>>>>>>>>> WHERE NOT > >>>>>>>>>>>> EXISTS ( > >>>>>>>>>>>> > >>>>>>>>>>>> SELECT Name > >>>>>>>>>>>> FROM Filename AS f > >>>>>>>>>>>> WHERE f.Name = a.Name > >>>>>>>>>>>> ) > >>>>>>>>>>> That looks silly. > >>>>>>>>>>> > >>>>>>>>>>> I would write it shorter as > >>>>>>>>>>> > >>>>>>>>>>> INSERT INTO Filename(Name) > >>>>>>>>>>> SELECT DISTINCT Name > >>>>>>>>>>> FROM batch AS a > >>>>>>>>>>> WHERE NOT EXISTS > >>>>>>>>>>> ( > >>>>>>>>>>> SELECT Name > >>>>>>>>>>> FROM Filename AS f > >>>>>>>>>>> WHERE f.Name = a.Name > >>>>>>>>>>> ) > >>>>>>>>>> You may also want to consider using a JOIN rather than a > >>>>>>>>>> subquery with a NOT EXISTS, something like (untested and > >>>>>>>>>> unvalidated!): > >>>>>>>>>> > >>>>>>>>>> INSERT INTO filename(name) > >>>>>>>>>> SELECT DISTINCT name > >>>>>>>>>> FROM batch AS a > >>>>>>>>>> LEFT JOIN filename AS f USING (name) > >>>>>>>>>> WHERE f.name IS NULL > >>>>>>>>>> > >>>>>>>>>> I know from experience as an Oracle DBA (my day job) that this > >>>>>>>>>> can often produce far more efficient results. > >>>>>>>>>> > >>>>>>>>>> Of course, all options need testing for both speed of > >>>>>>>>>> execution and resource usage, bearing in mind that data varies > >>>>>>>>>> from one installation to another, and one size may not fit > >>>>>>>>>> all! > >>>>>>>>> Good suggestions, sounds like there might be an overall > >>>>>>>>> performance problem with the current query for batch-inserts > >>>>>>>>> with lots of data. I'm a bit unsure if I dare test these > >>>>>>>>> queries on my current installation. > >>>>>>>>> > >>>>>>>>> I'll CC Kern on the thread, perhaps he has some insights. > >>>>>>>> So this morning I had to kill the above query because it's been > >>>>>>>> running for 24+ hours, preventing the new daily jobs from > >>>>>>>> running. I think I'm going to try and disable batch-inserts, the > >>>>>>>> current situation is simply not good enough, it's become a major > >>>>>>>> headache to run backups suddenly. When I hit the end of this > >>>>>>>> month and ALL servers have to run FULL backups, I'm gonna be in a > >>>>>>>> world of trouble I think - I just don't understand what has > >>>>>>>> changed, because it's all been running great up until now. > >>>>>>>> > >>>>>>>> // Tom > >>>>>>> Hello, > >>>>>>> > >>>>>>> We certainly can use help optimizing SQL since we are not DBAs, and > >>>>>>> we will look into the SQL optimization suggestions given above, > >>>>>>> keeping in mind that there are often rather radical differences in > >>>>>>> timing of particular SQL queries depending on the database engine > >>>>>>> used. > >>>>>>> > >>>>>>> To the best of my knowledge nothing has changed in terms of the > >>>>>>> Batch insert queries since when it was implemented, and it is > >>>>>>> *very* unlikely (though I haven't checked the code) that something > >>>>>>> changed from 2.4.4 to 3.0.x. > >>>>>>> > >>>>>>> More likely, your workload or MySQL has changed in some way -- e.g. > >>>>>>> more Jobs, more machines backed up, Director machine with less > >>>>>>> memory or other jobs that use memory, a new version of MySQL, ... > >>>>>>> > >>>>>>> - I would suggest that you ensure that your database has all the > >>>>>>> recommended indexes (see the make_mysql_tables file), and that you > >>>>>>> are running with the large memory /etc/my.cnf file. > >>>>>>> > >>>>>>> - Another thing to do is to compact your database. One way to do > >>>>>>> it is to write it to an ASCII file and then re-insert it. > >>>>>>> > >>>>>>> - If you are running certain programs that create and delete lots > >>>>>>> of temporary files with different names, you Filename table may > >>>>>>> need cleaning. > >>>>>>> > >>>>>>> - I would strongly recommend not starting *lots* of Full backups at > >>>>>>> the same time or on the same day. By lots, I mean more than say 10 > >>>>>>> or 20 (depends on the size of your system). It is generally far > >>>>>>> better to stage 1/4 of the backup every week for a full backup so > >>>>>>> that the peak workload is spread out over the month. > >>>>>>> > >>>>>>> - If the bottleneck is in MySQL, you might consider moving it to > >>>>>>> another machine that has more memory and faster disks. > >>>>>>> > >>>>>>> - If you really have a huge number of backups (say 50 or more) that > >>>>>>> all run at the same time, it might be advisable to consider using > >>>>>>> PostgreSQL, but in that case, you will probably need an onsite DBA > >>>>>>> to properly tune and maintain it. > >>>>>>> > >>>>>>> - Regardless of what hardware you have, there are certain > >>>>>>> limitations on how many simultaneous jobs you can run (this kicks > >>>>>>> in on many systems around 50). Once a certain number is exceeded, > >>>>>>> the total throughput can rather radically decrease so careful > >>>>>>> monitoring is necessary. Bweb can help a lot in these situations. > >>>>>>> > >>>>>>> - If you are really "in a world of trouble" and it is a performance > >>>>>>> issue, there is not much we (the Bacula project) can do for you > >>>>>>> other than the above tips. However, ($$) Bacula Systems has tools > >>>>>>> that can help more precisely identify bottlenecks and help balance > >>>>>>> loads. > >>>>>>> > >>>>>>> Regards, > >>>>>>> > >>>>>>> Kern > >>>>>>> > >>>>>>> ------------------------------------------------------- > >>>>>> Sorry Kern, I wasn't monitoring this thread (mysql in the title :) ) > >>>>>> > >>>>>> I wrote these queries, so maybe I should explain (not why I'm right, > >>>>>> I'm not sure of that, but why we've done it this way). > >>>>>> > >>>>>> A few forewords : > >>>>>> - batch insert was created at the begining for postgresql. I don't > >>>>>> have any experience with mysql, except that knowing at the moment we > >>>>>> programmed batch insert it didn't really do anything but nested > >>>>>> loops... - The main reason for batch inserts with postgresql was to > >>>>>> use COPY instead of INSERT, which is much faster (we also had good > >>>>>> results with stored procedures, but batch insert was even faster) > >>>>>> - our current file table is nearly 200GB (1 billion rows), with a > >>>>>> 250GB database, and we insert around 50 million records a day (rough > >>>>>> estimate) > >>>>>> > >>>>>> * First question I've seen in the thread: > >>>>>> > >>>>>> Why > >>>>>> > >>>>>> INSERT INTO Filename( Name ) > >>>>>> SELECT a.Name > >>>>>> FROM ( > >>>>>> SELECT DISTINCT Name > >>>>>> FROM batch > >>>>>> ) AS a > >>>>>> WHERE NOT EXISTS ( > >>>>>> SELECT Name > >>>>>> FROM Filename AS f > >>>>>> WHERE f.Name = a.Name > >>>>>> ) > >>>>>> > >>>>>> instead of the much simpler > >>>>>> > >>>>>> INSERT INTO Filename(Name) > >>>>>> SELECT DISTINCT Name > >>>>>> FROM batch AS a > >>>>>> WHERE NOT EXISTS > >>>>>> ( > >>>>>> SELECT Name > >>>>>> FROM Filename AS f > >>>>>> WHERE f.Name = a.Name > >>>>>> ) > >>>>>> > >>>>>> > >>>>>> => Because the first one removes the duplicate rows from batch table > >>>>>> before checking they exist in the filename table, while the second > >>>>>> one checks after. Performance was better in benchmarks with the first > >>>>>> one (with PostgreSQL, with our data). > >>>>>> > >>>>>> > >>>>>> * Second question : > >>>>>> Why not : > >>>>>> > >>>>>> INSERT INTO filename(name) > >>>>>> SELECT DISTINCT name > >>>>>> FROM batch AS a > >>>>>> LEFT JOIN filename AS f USING (name) > >>>>>> WHERE f.name IS NULL > >>>>>> > >>>>>> In fact, if the engine choses the right execution path, the execution > >>>>>> plan will be the roughly the same (nested loop) as > >>>>>> > >>>>>> INSERT INTO Filename(Name) > >>>>>> SELECT DISTINCT Name > >>>>>> FROM batch AS a > >>>>>> WHERE NOT EXISTS > >>>>>> ( > >>>>>> SELECT Name > >>>>>> FROM Filename AS f > >>>>>> WHERE f.Name = a.Name > >>>>>> ) > >>>>>> > >>>>>> Except if batch is close in size to filename (for instance on my > >>>>>> bacula database, it would be a 50 million files backup). Anyway, If i > >>>>>> recall correctly about mysql (and it hasn't changed recently), mysql > >>>>>> will do a nested loop for both, as it doesn't know how to merge or > >>>>>> hash join ... > >>>>>> > >>>>>> > >>>>>> But I agree that this second syntax theorically gives more lattitude > >>>>>> to the engine. Except that recent oracle and postgresql versions will > >>>>>> anyway be able to convert between the exists/not exists and the join > >>>>>> (or IN) in this case, as they are semantically equivalent. > >>>>>> > >>>>>> > >>>>>> Now, I don't know how mysql would react to all these queries. An > >>>>>> explain plan would be great to understand... but I would have thought > >>>>>> that the NOT EXIST query would have been great for it's optimizer, as > >>>>>> it tells mysql exactly what to do :) > >>>>>> > >>>>>> > >>>>>> A last note, for postgreSQL users on this mailing list : > >>>>>> You can try to get better performance with batch inserts by lowering > >>>>>> seq_page_cost and random_page_cost, so that the engine favours nested > >>>>>> loop instead of hash join for the INSERT INTO File query : most of > >>>>>> the time, most of filename and path will be in the database or > >>>>>> filesystem cache, and the plan will be much faster (and save you big > >>>>>> sorts) > >>>>>> > >>>>>> > >>>>>> Please tell me if you need more information. > >>>>>> > > -- ---------------------------------------------------------------------------- Jason A. Kates (ja...@kates.org) Fax: 208-975-1514 Phone: 212-400-1670 x2 ============================================================================ ------------------------------------------------------------------------------ Are you an open source citizen? Join us for the Open Source Bridge conference! Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250. Need another reason to go? 24-hour hacker lounge. Register today! http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel