hi Jason, hardware should be more than sufficient. Could you post your my.cnf ?
and tell what type of db is used (mysisam / inodb ) ps : I'm going to work, so it could take time before I answer .. Jason A. Kates wrote: > In my /etc/my.cnf I have this set: > tmpdir = /tmp > > /tmp is a memory file system to avoid I/O contention. > > When looking you can see that temp tables are create in /tmp > thus /var/tmp should be out of the loop. > > The db resides in /export/mysql which is part of a 6 disk 15K rpm > hardware raid 10 disk set. > > df -Th > Filesystem Type Size Used Avail Use% Mounted on > /dev/cciss/c0d0p1 > ext3 996M 489M 456M 52% / > /dev/mapper/Volume00-vartest > ext3 2.0G 368M 1.5G 20% /var > /dev/mapper/Volume00-export > ext3 186G 47G 130G 27% /export > /dev/mapper/Volume00-usr > ext3 3.9G 1.8G 2.0G 47% /usr > none tmpfs 8.0G 8.0K 8.0G 1% /tmp > /dev/cciss/c1d0p1 > ext3 943G 316G 579G 36% /var/spool/bacula > > > We are running Redhat ES 5 with Kernel 2.6.18-128.el5. > If you would like to test a mount option I would be glad to give it a > test. > > -Jason > > On Mon, 2009-06-22 at 07:38 +0200, Bruno Friedmann wrote: >> Hi Jason, can you tell us if the place mysql works typically /var/lib/mysql >> are on the same partition of /tmp or /var/tmp ? >> >> I've found (in another project) with mysql myisam table you can kill disk io >> if the two are on same disk. It's worst when you insert lot of data from the >> same. >> >> In a perfect world after tunning carefully the /etc/my.cnf ( huge isn't >> suffisant in many big case ) >> you should have one full io for mysql temp files, one full io for indexes >> one full io for tables >> AND one full io for import data ( completely separate /tmp for example ). >> >> In many installation, /tmp and /var are on the same device if you add the >> fact of lvm or raid 1 configuration >> you get some io trouble. >> >> I wouldn't explain all fine tuning of mysql here, there's a lot and each off >> them should be review & tested. >> >> Next there's also ext3 tuning for those who use it as many distro are very >> conservative. >> (Newer kernel >2.6.30 would reduce the impact) >> atime,diratime,data=ordered etc ... could change life :-) >> >> >> First time bacula came with batch-insert enabled, I've test it, and have >> really poor performances. >> So I disable it. ( lvm on top of soft raid 1 ) my mistake to not mention it >> on user/devel list. >> >> In some future here we decide to migrate all possible application to >> postgresql. >> >> For debugging and making some sort of test case, it would be nice to have a >> "joke" dump database >> containing lot of dir & files. >> so we can test large queries .... >> >> Perhaps someone with very large records could do it using the md5() on >> Filename and directory ? >> >> >> >> Jason A. Kates wrote: >>> Based on following the thread I recompile bacula with the following >>> options: >>> ./configure --enable-smartalloc --with-mysql >>> --with-tcp-wrappers=/usr/lib/libwrap.so --enable-static-fd >>> --enable-static-dir --enable-static-sd --disable-batch-insert >>> And my 18+ hour backup ran in 9 hours 22 mins 27 secs last night. It >>> seems to be back to the pre 3.0 time. >>> >>> As for my tables I have an Admin job that optimizes my bacula tables >>> within mysql. >>> Job { >>> Name = optimize_table_JobMedia >>> JobDefs = "DefaultJob" >>> Type = Admin >>> Client = bacula-local-fd >>> RunBeforeJob = "/etc/bacula/optimize_table_JobMedia.sh" >>> Priority = 40 >>> Cancel Queued Duplicates=yes >>> Maximum Concurrent Jobs = 1 >>> Run Before Job = "/etc/bacula/check-duplicate-queued-job.sh '%n' >>> '%i'" >>> } >>> >>> Please let me know what else I can do to provide useful data for the >>> improvement of bacula (Worlds best backup software!). >>> Thanks -Jason >>> >>> On Sun, 2009-06-21 at 08:11 +0200, Marc Cousin wrote: >>>> First thing to know is which of the queries takes time (there are the >>>> multiple >>>> inserts into the batch table, then insert missing filenames/path, and the >>>> final insert). Or is it slow with all the queries ? Which storage engine ? >>>> >>>> There are two possibilities : >>>> >>>> - either you have a bad plan for one of the queries (but I don't see how, >>>> mysql having a very simple optimizer). >>>> - or you have a tuning problem with mysql (there I can't really help you : >>>> not >>>> enough memory ? fragmented indexes ? I don't really know how mysql works) >>>> >>>> Kern, I have not been following closely the 3.0 development : do the xattr >>>> go >>>> into the lstat field ? >>>> >>>> On Sunday 21 June 2009 01:52:34 Jason A. Kates wrote: >>>>> I have a backup that runs for 3 hours then spends another 15 hours in >>>>> status "Dir inserting Attributes". >>>>> >>>>> The backup has 9,442,578 files and 239,718,461,832 bytes. It was much >>>>> faster with bacula before and I thought that that the time change was >>>>> due to additional data with xattrsupport that didn't exist before 3.0. >>>>> The job a spool job with spooled attributes too. >>>>> >>>>> >>>>> >>>>> >>>>> We are using mysql-server-5.0.45-7.el5. I would be glad to collect and >>>>> share data. Please let me know what you sould like to me set or do to >>>>> get the appropriate data. >>>>> >>>>> Thanks -Jason >>>>> >>>>> On Sat, 2009-06-20 at 17:48 +0200, Marc Cousin wrote: >>>>>> There may be something. But first we have to know why mysql behaves that >>>>>> slowly for Tom Summers. The problem being that I don't have a big mysql >>>>>> database to test on ... So as long as we don't have the query plan or an >>>>>> explanation ... >>>>>> >>>>>> On Saturday 20 June 2009 14:36:44 Kern Sibbald wrote: >>>>>>> Hello Marc, >>>>>>> >>>>>>> Thanks for looking at this and for your responses. If I understand you >>>>>>> correctly, there is nothing to be done, which does not surprise me :-) >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Kern >>>>>>> >>>>>>> On Saturday 20 June 2009 13:16:44 Marc Cousin wrote: >>>>>>>>> Subject: Re: [Bacula-users] Performance with MySQL queries since >>>>>>>>> 3.0.0 (Dir inserting attributes hang) >>>>>>>>> Date: Saturday 20 June 2009 >>>>>>>>> From: Kern Sibbald <k...@sibbald.com> >>>>>>>>> To: Tom Sommer <m...@tomsommer.dk> >>>>>>>>> >>>>>>>>> On Saturday 20 June 2009 08:51:53 Tom Sommer wrote: >>>>>>>>>> Tom Sommer wrote: >>>>>>>>>>> Mike Holden wrote: >>>>>>>>>>>> Jari Fredriksson wrote: >>>>>>>>>>>>>> INSERT INTO Filename( Name ) >>>>>>>>>>>>>> SELECT a.Name >>>>>>>>>>>>>> FROM ( >>>>>>>>>>>>>> >>>>>>>>>>>>>> SELECT DISTINCT Name >>>>>>>>>>>>>> FROM batch >>>>>>>>>>>>>> ) AS a >>>>>>>>>>>>>> WHERE NOT >>>>>>>>>>>>>> EXISTS ( >>>>>>>>>>>>>> >>>>>>>>>>>>>> SELECT Name >>>>>>>>>>>>>> FROM Filename AS f >>>>>>>>>>>>>> WHERE f.Name = a.Name >>>>>>>>>>>>>> ) >>>>>>>>>>>>> That looks silly. >>>>>>>>>>>>> >>>>>>>>>>>>> I would write it shorter as >>>>>>>>>>>>> >>>>>>>>>>>>> INSERT INTO Filename(Name) >>>>>>>>>>>>> SELECT DISTINCT Name >>>>>>>>>>>>> FROM batch AS a >>>>>>>>>>>>> WHERE NOT EXISTS >>>>>>>>>>>>> ( >>>>>>>>>>>>> SELECT Name >>>>>>>>>>>>> FROM Filename AS f >>>>>>>>>>>>> WHERE f.Name = a.Name >>>>>>>>>>>>> ) >>>>>>>>>>>> You may also want to consider using a JOIN rather than a >>>>>>>>>>>> subquery with a NOT EXISTS, something like (untested and >>>>>>>>>>>> unvalidated!): >>>>>>>>>>>> >>>>>>>>>>>> INSERT INTO filename(name) >>>>>>>>>>>> SELECT DISTINCT name >>>>>>>>>>>> FROM batch AS a >>>>>>>>>>>> LEFT JOIN filename AS f USING (name) >>>>>>>>>>>> WHERE f.name IS NULL >>>>>>>>>>>> >>>>>>>>>>>> I know from experience as an Oracle DBA (my day job) that this >>>>>>>>>>>> can often produce far more efficient results. >>>>>>>>>>>> >>>>>>>>>>>> Of course, all options need testing for both speed of >>>>>>>>>>>> execution and resource usage, bearing in mind that data varies >>>>>>>>>>>> from one installation to another, and one size may not fit >>>>>>>>>>>> all! >>>>>>>>>>> Good suggestions, sounds like there might be an overall >>>>>>>>>>> performance problem with the current query for batch-inserts >>>>>>>>>>> with lots of data. I'm a bit unsure if I dare test these >>>>>>>>>>> queries on my current installation. >>>>>>>>>>> >>>>>>>>>>> I'll CC Kern on the thread, perhaps he has some insights. >>>>>>>>>> So this morning I had to kill the above query because it's been >>>>>>>>>> running for 24+ hours, preventing the new daily jobs from >>>>>>>>>> running. I think I'm going to try and disable batch-inserts, the >>>>>>>>>> current situation is simply not good enough, it's become a major >>>>>>>>>> headache to run backups suddenly. When I hit the end of this >>>>>>>>>> month and ALL servers have to run FULL backups, I'm gonna be in a >>>>>>>>>> world of trouble I think - I just don't understand what has >>>>>>>>>> changed, because it's all been running great up until now. >>>>>>>>>> >>>>>>>>>> // Tom >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> We certainly can use help optimizing SQL since we are not DBAs, and >>>>>>>>> we will look into the SQL optimization suggestions given above, >>>>>>>>> keeping in mind that there are often rather radical differences in >>>>>>>>> timing of particular SQL queries depending on the database engine >>>>>>>>> used. >>>>>>>>> >>>>>>>>> To the best of my knowledge nothing has changed in terms of the >>>>>>>>> Batch insert queries since when it was implemented, and it is >>>>>>>>> *very* unlikely (though I haven't checked the code) that something >>>>>>>>> changed from 2.4.4 to 3.0.x. >>>>>>>>> >>>>>>>>> More likely, your workload or MySQL has changed in some way -- e.g. >>>>>>>>> more Jobs, more machines backed up, Director machine with less >>>>>>>>> memory or other jobs that use memory, a new version of MySQL, ... >>>>>>>>> >>>>>>>>> - I would suggest that you ensure that your database has all the >>>>>>>>> recommended indexes (see the make_mysql_tables file), and that you >>>>>>>>> are running with the large memory /etc/my.cnf file. >>>>>>>>> >>>>>>>>> - Another thing to do is to compact your database. One way to do >>>>>>>>> it is to write it to an ASCII file and then re-insert it. >>>>>>>>> >>>>>>>>> - If you are running certain programs that create and delete lots >>>>>>>>> of temporary files with different names, you Filename table may >>>>>>>>> need cleaning. >>>>>>>>> >>>>>>>>> - I would strongly recommend not starting *lots* of Full backups at >>>>>>>>> the same time or on the same day. By lots, I mean more than say 10 >>>>>>>>> or 20 (depends on the size of your system). It is generally far >>>>>>>>> better to stage 1/4 of the backup every week for a full backup so >>>>>>>>> that the peak workload is spread out over the month. >>>>>>>>> >>>>>>>>> - If the bottleneck is in MySQL, you might consider moving it to >>>>>>>>> another machine that has more memory and faster disks. >>>>>>>>> >>>>>>>>> - If you really have a huge number of backups (say 50 or more) that >>>>>>>>> all run at the same time, it might be advisable to consider using >>>>>>>>> PostgreSQL, but in that case, you will probably need an onsite DBA >>>>>>>>> to properly tune and maintain it. >>>>>>>>> >>>>>>>>> - Regardless of what hardware you have, there are certain >>>>>>>>> limitations on how many simultaneous jobs you can run (this kicks >>>>>>>>> in on many systems around 50). Once a certain number is exceeded, >>>>>>>>> the total throughput can rather radically decrease so careful >>>>>>>>> monitoring is necessary. Bweb can help a lot in these situations. >>>>>>>>> >>>>>>>>> - If you are really "in a world of trouble" and it is a performance >>>>>>>>> issue, there is not much we (the Bacula project) can do for you >>>>>>>>> other than the above tips. However, ($$) Bacula Systems has tools >>>>>>>>> that can help more precisely identify bottlenecks and help balance >>>>>>>>> loads. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Kern >>>>>>>>> >>>>>>>>> ------------------------------------------------------- >>>>>>>> Sorry Kern, I wasn't monitoring this thread (mysql in the title :) ) >>>>>>>> >>>>>>>> I wrote these queries, so maybe I should explain (not why I'm right, >>>>>>>> I'm not sure of that, but why we've done it this way). >>>>>>>> >>>>>>>> A few forewords : >>>>>>>> - batch insert was created at the begining for postgresql. I don't >>>>>>>> have any experience with mysql, except that knowing at the moment we >>>>>>>> programmed batch insert it didn't really do anything but nested >>>>>>>> loops... - The main reason for batch inserts with postgresql was to >>>>>>>> use COPY instead of INSERT, which is much faster (we also had good >>>>>>>> results with stored procedures, but batch insert was even faster) >>>>>>>> - our current file table is nearly 200GB (1 billion rows), with a >>>>>>>> 250GB database, and we insert around 50 million records a day (rough >>>>>>>> estimate) >>>>>>>> >>>>>>>> * First question I've seen in the thread: >>>>>>>> >>>>>>>> Why >>>>>>>> >>>>>>>> INSERT INTO Filename( Name ) >>>>>>>> SELECT a.Name >>>>>>>> FROM ( >>>>>>>> SELECT DISTINCT Name >>>>>>>> FROM batch >>>>>>>> ) AS a >>>>>>>> WHERE NOT EXISTS ( >>>>>>>> SELECT Name >>>>>>>> FROM Filename AS f >>>>>>>> WHERE f.Name = a.Name >>>>>>>> ) >>>>>>>> >>>>>>>> instead of the much simpler >>>>>>>> >>>>>>>> INSERT INTO Filename(Name) >>>>>>>> SELECT DISTINCT Name >>>>>>>> FROM batch AS a >>>>>>>> WHERE NOT EXISTS >>>>>>>> ( >>>>>>>> SELECT Name >>>>>>>> FROM Filename AS f >>>>>>>> WHERE f.Name = a.Name >>>>>>>> ) >>>>>>>> >>>>>>>> >>>>>>>> => Because the first one removes the duplicate rows from batch table >>>>>>>> before checking they exist in the filename table, while the second >>>>>>>> one checks after. Performance was better in benchmarks with the first >>>>>>>> one (with PostgreSQL, with our data). >>>>>>>> >>>>>>>> >>>>>>>> * Second question : >>>>>>>> Why not : >>>>>>>> >>>>>>>> INSERT INTO filename(name) >>>>>>>> SELECT DISTINCT name >>>>>>>> FROM batch AS a >>>>>>>> LEFT JOIN filename AS f USING (name) >>>>>>>> WHERE f.name IS NULL >>>>>>>> >>>>>>>> In fact, if the engine choses the right execution path, the execution >>>>>>>> plan will be the roughly the same (nested loop) as >>>>>>>> >>>>>>>> INSERT INTO Filename(Name) >>>>>>>> SELECT DISTINCT Name >>>>>>>> FROM batch AS a >>>>>>>> WHERE NOT EXISTS >>>>>>>> ( >>>>>>>> SELECT Name >>>>>>>> FROM Filename AS f >>>>>>>> WHERE f.Name = a.Name >>>>>>>> ) >>>>>>>> >>>>>>>> Except if batch is close in size to filename (for instance on my >>>>>>>> bacula database, it would be a 50 million files backup). Anyway, If i >>>>>>>> recall correctly about mysql (and it hasn't changed recently), mysql >>>>>>>> will do a nested loop for both, as it doesn't know how to merge or >>>>>>>> hash join ... >>>>>>>> >>>>>>>> >>>>>>>> But I agree that this second syntax theorically gives more lattitude >>>>>>>> to the engine. Except that recent oracle and postgresql versions will >>>>>>>> anyway be able to convert between the exists/not exists and the join >>>>>>>> (or IN) in this case, as they are semantically equivalent. >>>>>>>> >>>>>>>> >>>>>>>> Now, I don't know how mysql would react to all these queries. An >>>>>>>> explain plan would be great to understand... but I would have thought >>>>>>>> that the NOT EXIST query would have been great for it's optimizer, as >>>>>>>> it tells mysql exactly what to do :) >>>>>>>> >>>>>>>> >>>>>>>> A last note, for postgreSQL users on this mailing list : >>>>>>>> You can try to get better performance with batch inserts by lowering >>>>>>>> seq_page_cost and random_page_cost, so that the engine favours nested >>>>>>>> loop instead of hash join for the INSERT INTO File query : most of >>>>>>>> the time, most of filename and path will be in the database or >>>>>>>> filesystem cache, and the plan will be much faster (and save you big >>>>>>>> sorts) >>>>>>>> >>>>>>>> >>>>>>>> Please tell me if you need more information. >>>>>>>> >> > -- Bruno Friedmann ------------------------------------------------------------------------------ Are you an open source citizen? Join us for the Open Source Bridge conference! Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250. Need another reason to go? 24-hour hacker lounge. Register today! http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel