I ran the DB check last week via this script, if you add the two indexes
at once it only has to rebuild the table once is takes about 50% of the
time.

#!/bin/sh
echo 'alter table File add index idxFI  (FilenameId), add index idxPI
(PathId);' | /usr/bin/mysql -u bacula --password=xxxxxxxxxx bacula
/sbin/dbcheck -c /etc/bacula/bacula-dir.conf -b -f
echo 'alter table File drop index idxFI, drop index idxPIchk;'
| /usr/bin/mysql -u bacula --password=xxxxxxxxxxx bacula


Does anybody know how the temp batch table is created and if it has any
indexes on it to start?   I don't know how to view the batch table as
it's a temp table and is only available to the session that created it.

I have no issues recompiling a new version of bacula if some debug or
timing code could be added.

That said the backs are working very well with batch mode disabled.

                             Thanks -Jason


On Mon, 2009-06-22 at 21:05 +0200, Bruno Friedmann wrote:
> Hi Jason, I'm looking around conf a table status, comparing it to some config 
> here.
> But nothing special "wrong" appears.
> I'm just using the utf8_general_ci as default encoding for db & table.
> 
> 
> I've just two questions : did you already run the dbcheck ?
> (be carefull if not it could take days ... especially against your db )
> 
> I have added this indexes ( taken from the wiki )
> CREATE INDEX idxPathId ON File (PathId);
> CREATE INDEX idxFilenameId ON File (FilenameId);
> CREATE INDEX idxJPF ON File (JobId, PathId, FilenameId);
> CREATE INDEX idxFJ ON File (FileId,JobId);
> CREATE INDEX idxPJ ON File (PathId,JobId);
> to have a dbcheck running quickly.
> But with the db size you have, building the indexes could be long ...
> 
> Beside that, I don't explain why you have trouble.
> Perharps it's could be useful ( when your backup server are not backing :-)
> to retry the batch-insert query with explain to see how mysql would do it.
> ( the trouble is to capture the batch table of course ).
> 
> CALL For mysql administrator who have an Enterprise DB contract.
> If someone has this type, then he can ask the question directly to mysql 
> engineers. ?
> 
> 
> 
> 
> Jason A. Kates wrote:
> > Bruno,
> > Thanks for taking a look at the batch mysql issue. We started with inodb
> > and moved to MyISAM.  MyISAM seemed marginally faster.   I have included
> > the output of show table status.
> > 
> > Without the batch my troubled backup is no longer troubled it's working
> > at a decent rate based on the number of small files it has.   If this
> > was the same size with big files the transfer rates would be faster etc.
> > 
> > 22-Jun 06:58 backup-server JobId 70468: Sending spooled attrs to the
> > Director. Despooling 2,897,722,241 bytes ...
> > 22-Jun 07:33 backup-server JobId 70468: Bacula qwest-netbackup-dir 3.0.2
> > (18Jun09): 22-Jun-2009 07:33:15
> >   Build OS:               x86_64-unknown-linux-gnu redhat Enterprise
> > release
> >   JobId:                  70468
> >   Job:                    backup-client1.2009-06-22_01.03.05_56
> >   Backup Level:           Full
> >   Client:                 "backup-client1" 3.0.1 (30Apr09)
> > x86_64-unknown-linux-gnu,redhat,Enterprise release
> >   FileSet:                "LINUX_ALL_LOCAL_FS" 2009-05-17 11:11:12
> >   Pool:                   "Daily" (From Job resource)
> >   Catalog:                "MyCatalog" (From Client resource)
> >   Storage:                "Autochanger" (From Job resource)
> >   Scheduled time:         22-Jun-2009 01:03:05
> >   Start time:             22-Jun-2009 01:03:20
> >   End time:               22-Jun-2009 07:33:15
> >   Elapsed time:           6 hours 29 mins 55 secs
> >   Priority:               20
> >   FD Files Written:       9,448,749
> >   SD Files Written:       9,448,749
> >   FD Bytes Written:       242,193,953,548 (242.1 GB)
> >   SD Bytes Written:       243,431,447,949 (243.4 GB)
> >   Rate:                   10352.4 KB/s
> >   Software Compression:   None
> >   VSS:                    no
> >   Encryption:             no
> >   Accurate:               yes
> >   Volume name(s):         CA3833L4|CA3831L4
> >   Volume Session Id:      213
> >   Volume Session Time:    1245547751
> >   Last Volume Bytes:      871,293,198,336 (871.2 GB)
> >   Non-fatal FD errors:    1
> >   SD Errors:              0
> >   FD termination status:  OK
> >   SD termination status:  OK
> >   Termination:            Backup OK -- with warnings
> > 
> >                                  -Jason
> > 
> > On Mon, 2009-06-22 at 09:02 +0200, Bruno Friedmann wrote:
> >> hi Jason, hardware should be more than sufficient.
> >>
> >> Could you post your my.cnf ?
> >>
> >> and tell what type of db is used (mysisam / inodb )
> >>
> >> ps :  I'm going to work, so it could take time before I answer ..
> >>
> >> Jason A. Kates wrote:
> >>> In my /etc/my.cnf  I have this set: 
> >>> tmpdir = /tmp
> >>>
> >>> /tmp is a memory file system to avoid I/O contention.
> >>>
> >>> When looking you can see that temp tables are create in /tmp
> >>> thus /var/tmp should be out of the loop.
> >>>
> >>> The db resides in /export/mysql which is part of a 6 disk 15K rpm
> >>> hardware raid 10 disk set.
> >>>
> >>> df -Th
> >>> Filesystem    Type    Size  Used Avail Use% Mounted on
> >>> /dev/cciss/c0d0p1
> >>>               ext3    996M  489M  456M  52% /
> >>> /dev/mapper/Volume00-vartest
> >>>               ext3    2.0G  368M  1.5G  20% /var
> >>> /dev/mapper/Volume00-export
> >>>               ext3    186G   47G  130G  27% /export
> >>> /dev/mapper/Volume00-usr
> >>>               ext3    3.9G  1.8G  2.0G  47% /usr
> >>> none         tmpfs    8.0G  8.0K  8.0G   1% /tmp
> >>> /dev/cciss/c1d0p1
> >>>               ext3    943G  316G  579G  36% /var/spool/bacula
> >>>
> >>>
> >>> We are running Redhat ES 5 with Kernel 2.6.18-128.el5.
> >>> If you would like to test a mount option I would be glad to give it a
> >>> test.  
> >>>
> >>>                                  -Jason
> >>>
> >>> On Mon, 2009-06-22 at 07:38 +0200, Bruno Friedmann wrote:
> >>>> Hi Jason, can you tell us if the place mysql works typically 
> >>>> /var/lib/mysql
> >>>> are on the same partition of /tmp or /var/tmp ?
> >>>>
> >>>> I've found (in another project) with mysql myisam table you can kill 
> >>>> disk io
> >>>> if the two are on same disk. It's worst when you insert lot of data from 
> >>>> the same.
> >>>>
> >>>> In a perfect world after tunning carefully the /etc/my.cnf ( huge isn't 
> >>>> suffisant in many big case )
> >>>> you should have one full io for mysql temp files, one full io for 
> >>>> indexes one full io for tables
> >>>> AND one full io for import data ( completely separate /tmp for example ).
> >>>>
> >>>> In many installation, /tmp and /var are on the same device if you add 
> >>>> the fact of lvm or raid 1 configuration
> >>>> you get some io trouble.
> >>>>
> >>>> I wouldn't explain all fine tuning of mysql here, there's a lot and each 
> >>>> off them should be review & tested.
> >>>>
> >>>> Next there's also ext3 tuning for those who use it as many distro are 
> >>>> very conservative.
> >>>> (Newer kernel >2.6.30 would reduce the impact)
> >>>> atime,diratime,data=ordered etc ... could change life :-)
> >>>>
> >>>>
> >>>> First time bacula came with batch-insert enabled, I've test it, and have 
> >>>> really poor performances.
> >>>> So I disable it. ( lvm on top of soft raid 1 ) my mistake to not mention 
> >>>> it on user/devel list.
> >>>>
> >>>> In some future here we decide to migrate all possible application to 
> >>>> postgresql.
> >>>>
> >>>> For debugging and making some sort of test case, it would be nice to 
> >>>> have a "joke" dump database
> >>>> containing lot of dir & files.
> >>>> so we can test large queries ....
> >>>>
> >>>> Perhaps someone with very large records could do it using the md5() on 
> >>>> Filename and directory ?
> >>>>
> >>>>
> >>>>
> >>>> Jason A. Kates wrote:
> >>>>> Based on following the thread I recompile bacula with the following
> >>>>> options:
> >>>>> ./configure --enable-smartalloc --with-mysql
> >>>>> --with-tcp-wrappers=/usr/lib/libwrap.so --enable-static-fd
> >>>>> --enable-static-dir --enable-static-sd --disable-batch-insert
> >>>>> And my 18+ hour backup ran in 9 hours 22 mins 27 secs last night.   It
> >>>>> seems to be back to the pre 3.0 time.
> >>>>>
> >>>>> As for my tables I have an Admin job that optimizes my bacula tables
> >>>>> within mysql.  
> >>>>> Job {
> >>>>>         Name = optimize_table_JobMedia
> >>>>>         JobDefs = "DefaultJob"
> >>>>>         Type = Admin
> >>>>>         Client = bacula-local-fd
> >>>>>         RunBeforeJob = "/etc/bacula/optimize_table_JobMedia.sh"
> >>>>>         Priority = 40
> >>>>>         Cancel Queued Duplicates=yes
> >>>>>         Maximum Concurrent Jobs = 1
> >>>>>         Run Before Job = "/etc/bacula/check-duplicate-queued-job.sh '%n'
> >>>>> '%i'"
> >>>>> }
> >>>>>
> >>>>> Please let me know what else I can do to provide useful data for the
> >>>>> improvement of bacula (Worlds best backup software!).
> >>>>>                              Thanks -Jason
> >>>>>
> >>>>> On Sun, 2009-06-21 at 08:11 +0200, Marc Cousin wrote:
> >>>>>> First thing to know is which of the queries takes time (there are the 
> >>>>>> multiple 
> >>>>>> inserts into the batch table, then insert missing filenames/path, and 
> >>>>>> the 
> >>>>>> final insert). Or is it slow with all the queries ? Which storage 
> >>>>>> engine ?
> >>>>>>
> >>>>>> There are two possibilities :
> >>>>>>
> >>>>>> - either you have a bad plan for one of the queries (but I don't see 
> >>>>>> how, 
> >>>>>> mysql having a very simple optimizer).
> >>>>>> - or you have a tuning problem with mysql (there I can't really help 
> >>>>>> you : not 
> >>>>>> enough memory ? fragmented indexes ? I don't really know how mysql 
> >>>>>> works)
> >>>>>>
> >>>>>> Kern, I have not been following closely the 3.0 development : do the 
> >>>>>> xattr go 
> >>>>>> into the lstat field ?
> >>>>>>
> >>>>>> On Sunday 21 June 2009 01:52:34 Jason A. Kates wrote:
> >>>>>>> I have a backup that runs for 3 hours then spends another 15 hours in
> >>>>>>> status "Dir inserting Attributes".
> >>>>>>>
> >>>>>>> The backup has 9,442,578 files and 239,718,461,832 bytes.   It was 
> >>>>>>> much
> >>>>>>> faster with bacula before and I thought that that the time change was
> >>>>>>> due to  additional data with xattrsupport that didn't exist before 
> >>>>>>> 3.0.
> >>>>>>> The job a spool job with spooled attributes too.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> We are using mysql-server-5.0.45-7.el5.   I would be glad to collect 
> >>>>>>> and
> >>>>>>> share data.  Please let me know what you sould like to me set or do to
> >>>>>>> get the appropriate data.
> >>>>>>>
> >>>>>>>                              Thanks -Jason
> >>>>>>>
> >>>>>>> On Sat, 2009-06-20 at 17:48 +0200, Marc Cousin wrote:
> >>>>>>>> There may be something. But first we have to know why mysql behaves 
> >>>>>>>> that
> >>>>>>>> slowly for Tom Summers. The problem being that I don't have a big 
> >>>>>>>> mysql
> >>>>>>>> database to test on ... So as long as we don't have the query plan 
> >>>>>>>> or an
> >>>>>>>> explanation ...
> >>>>>>>>
> >>>>>>>> On Saturday 20 June 2009 14:36:44 Kern Sibbald wrote:
> >>>>>>>>> Hello Marc,
> >>>>>>>>>
> >>>>>>>>> Thanks for looking at this and for your responses.  If I understand 
> >>>>>>>>> you
> >>>>>>>>> correctly, there is nothing to be done, which does not surprise me 
> >>>>>>>>> :-)
> >>>>>>>>>
> >>>>>>>>> Best regards,
> >>>>>>>>>
> >>>>>>>>> Kern
> >>>>>>>>>
> >>>>>>>>> On Saturday 20 June 2009 13:16:44 Marc Cousin wrote:
> >>>>>>>>>>> Subject: Re: [Bacula-users] Performance with MySQL queries since
> >>>>>>>>>>> 3.0.0 (Dir inserting attributes hang)
> >>>>>>>>>>> Date: Saturday 20 June 2009
> >>>>>>>>>>> From: Kern Sibbald <k...@sibbald.com>
> >>>>>>>>>>> To: Tom Sommer <m...@tomsommer.dk>
> >>>>>>>>>>>
> >>>>>>>>>>> On Saturday 20 June 2009 08:51:53 Tom Sommer wrote:
> >>>>>>>>>>>> Tom Sommer wrote:
> >>>>>>>>>>>>> Mike Holden wrote:
> >>>>>>>>>>>>>> Jari Fredriksson wrote:
> >>>>>>>>>>>>>>>> INSERT INTO Filename( Name )
> >>>>>>>>>>>>>>>> SELECT a.Name
> >>>>>>>>>>>>>>>> FROM (
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> SELECT DISTINCT Name
> >>>>>>>>>>>>>>>> FROM batch
> >>>>>>>>>>>>>>>> ) AS a
> >>>>>>>>>>>>>>>> WHERE NOT
> >>>>>>>>>>>>>>>> EXISTS (
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> SELECT Name
> >>>>>>>>>>>>>>>> FROM Filename AS f
> >>>>>>>>>>>>>>>> WHERE f.Name = a.Name
> >>>>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>> That looks silly.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I would write it shorter as
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> INSERT INTO Filename(Name)
> >>>>>>>>>>>>>>> SELECT DISTINCT Name
> >>>>>>>>>>>>>>> FROM batch AS a
> >>>>>>>>>>>>>>> WHERE NOT EXISTS
> >>>>>>>>>>>>>>> (
> >>>>>>>>>>>>>>>     SELECT Name
> >>>>>>>>>>>>>>>     FROM Filename AS f
> >>>>>>>>>>>>>>>     WHERE f.Name = a.Name
> >>>>>>>>>>>>>>> )
> >>>>>>>>>>>>>> You may also want to consider using a JOIN rather than a
> >>>>>>>>>>>>>> subquery with a NOT EXISTS, something like (untested and
> >>>>>>>>>>>>>> unvalidated!):
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> INSERT INTO filename(name)
> >>>>>>>>>>>>>> SELECT DISTINCT name
> >>>>>>>>>>>>>> FROM batch AS a
> >>>>>>>>>>>>>> LEFT JOIN filename AS f USING (name)
> >>>>>>>>>>>>>> WHERE f.name IS NULL
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I know from experience as an Oracle DBA (my day job) that this
> >>>>>>>>>>>>>> can often produce far more efficient results.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Of course, all options need testing for both speed of
> >>>>>>>>>>>>>> execution and resource usage, bearing in mind that data varies
> >>>>>>>>>>>>>> from one installation to another, and one size may not fit
> >>>>>>>>>>>>>> all!
> >>>>>>>>>>>>> Good suggestions, sounds like there might be an overall
> >>>>>>>>>>>>> performance problem with the current query for batch-inserts
> >>>>>>>>>>>>> with lots of data. I'm a bit unsure if I dare test these
> >>>>>>>>>>>>> queries on my current installation.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'll CC Kern on the thread, perhaps he has some insights.
> >>>>>>>>>>>> So this morning I had to kill the above query because it's been
> >>>>>>>>>>>> running for 24+ hours, preventing the new daily jobs from
> >>>>>>>>>>>> running. I think I'm going to try and disable batch-inserts, the
> >>>>>>>>>>>> current situation is simply not good enough, it's become a major
> >>>>>>>>>>>> headache to run backups suddenly. When I hit the end of this
> >>>>>>>>>>>> month and ALL servers have to run FULL backups, I'm gonna be in a
> >>>>>>>>>>>> world of trouble I think - I just don't understand what has
> >>>>>>>>>>>> changed, because it's all been running great up until now.
> >>>>>>>>>>>>
> >>>>>>>>>>>> // Tom
> >>>>>>>>>>> Hello,
> >>>>>>>>>>>
> >>>>>>>>>>> We certainly can use help optimizing SQL since we are not DBAs, 
> >>>>>>>>>>> and
> >>>>>>>>>>> we will look into the SQL optimization suggestions given above,
> >>>>>>>>>>> keeping in mind that there are often rather radical differences in
> >>>>>>>>>>> timing of particular SQL queries depending on the database engine
> >>>>>>>>>>> used.
> >>>>>>>>>>>
> >>>>>>>>>>> To the best of my knowledge nothing has changed in terms of the
> >>>>>>>>>>> Batch insert queries since when it was implemented, and it is
> >>>>>>>>>>> *very* unlikely (though I haven't checked the code) that something
> >>>>>>>>>>> changed from 2.4.4 to 3.0.x.
> >>>>>>>>>>>
> >>>>>>>>>>> More likely, your workload or MySQL has changed in some way -- 
> >>>>>>>>>>> e.g.
> >>>>>>>>>>> more Jobs, more machines backed up, Director machine with less
> >>>>>>>>>>> memory or other jobs that use memory, a new version of MySQL, ...
> >>>>>>>>>>>
> >>>>>>>>>>> - I would suggest that you ensure that your database has all the
> >>>>>>>>>>> recommended indexes (see the make_mysql_tables file), and that you
> >>>>>>>>>>> are running with the large memory /etc/my.cnf file.
> >>>>>>>>>>>
> >>>>>>>>>>> - Another thing to do is to compact your database.  One way to do
> >>>>>>>>>>> it is to write it to an ASCII file and then re-insert it.
> >>>>>>>>>>>
> >>>>>>>>>>> - If you are running certain programs that create and delete lots
> >>>>>>>>>>> of temporary files with different names, you Filename table may
> >>>>>>>>>>> need cleaning.
> >>>>>>>>>>>
> >>>>>>>>>>> - I would strongly recommend not starting *lots* of Full backups 
> >>>>>>>>>>> at
> >>>>>>>>>>> the same time or on the same day.  By lots, I mean more than say 
> >>>>>>>>>>> 10
> >>>>>>>>>>> or 20 (depends on the size of your system).  It is generally far
> >>>>>>>>>>> better to stage 1/4 of the backup every week for a full backup so
> >>>>>>>>>>> that the peak workload is spread out over the month.
> >>>>>>>>>>>
> >>>>>>>>>>> - If the bottleneck is in MySQL, you might consider moving it to
> >>>>>>>>>>> another machine that has more memory and faster disks.
> >>>>>>>>>>>
> >>>>>>>>>>> - If you really have a huge number of backups (say 50 or more) 
> >>>>>>>>>>> that
> >>>>>>>>>>> all run at the same time, it might be advisable to consider using
> >>>>>>>>>>> PostgreSQL, but in that case, you will probably need an onsite DBA
> >>>>>>>>>>> to properly tune and maintain it.
> >>>>>>>>>>>
> >>>>>>>>>>> - Regardless of what hardware you have, there are certain
> >>>>>>>>>>> limitations on how many simultaneous jobs you can run (this kicks
> >>>>>>>>>>> in on many systems around 50). Once a certain number is exceeded,
> >>>>>>>>>>> the total throughput can rather radically decrease so careful
> >>>>>>>>>>> monitoring is necessary.  Bweb can help a lot in these situations.
> >>>>>>>>>>>
> >>>>>>>>>>> - If you are really "in a world of trouble" and it is a 
> >>>>>>>>>>> performance
> >>>>>>>>>>> issue, there is not much we (the Bacula project) can do for you
> >>>>>>>>>>> other than the above tips.  However, ($$) Bacula Systems has tools
> >>>>>>>>>>> that can help more precisely identify bottlenecks and help balance
> >>>>>>>>>>> loads.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>>
> >>>>>>>>>>> Kern
> >>>>>>>>>>>
> >>>>>>>>>>> -------------------------------------------------------
> >>>>>>>>>> Sorry Kern, I wasn't monitoring this thread (mysql in the title  
> >>>>>>>>>> :) )
> >>>>>>>>>>
> >>>>>>>>>> I wrote these queries, so maybe I should explain (not why I'm 
> >>>>>>>>>> right,
> >>>>>>>>>> I'm not sure of that, but why we've done it this way).
> >>>>>>>>>>
> >>>>>>>>>> A few forewords :
> >>>>>>>>>> - batch insert was created at the begining for postgresql. I don't
> >>>>>>>>>> have any experience with mysql, except that knowing at the moment 
> >>>>>>>>>> we
> >>>>>>>>>> programmed batch insert it didn't really do anything but nested
> >>>>>>>>>> loops... - The main reason for batch inserts with postgresql was to
> >>>>>>>>>> use COPY instead of INSERT, which is much faster (we also had good
> >>>>>>>>>> results with stored procedures, but batch insert was even faster)
> >>>>>>>>>> - our current file table is nearly 200GB (1 billion rows), with a
> >>>>>>>>>> 250GB database, and we insert around 50 million records a day 
> >>>>>>>>>> (rough
> >>>>>>>>>> estimate)
> >>>>>>>>>>
> >>>>>>>>>> * First question I've seen in the thread:
> >>>>>>>>>>
> >>>>>>>>>> Why
> >>>>>>>>>>
> >>>>>>>>>> INSERT INTO Filename( Name )
> >>>>>>>>>> SELECT a.Name
> >>>>>>>>>> FROM (
> >>>>>>>>>>    SELECT DISTINCT Name
> >>>>>>>>>>    FROM batch
> >>>>>>>>>> ) AS a
> >>>>>>>>>> WHERE NOT EXISTS (
> >>>>>>>>>>    SELECT Name
> >>>>>>>>>>    FROM Filename AS f
> >>>>>>>>>>    WHERE f.Name = a.Name
> >>>>>>>>>> )
> >>>>>>>>>>
> >>>>>>>>>> instead of the much simpler
> >>>>>>>>>>
> >>>>>>>>>> INSERT INTO Filename(Name)
> >>>>>>>>>> SELECT DISTINCT Name
> >>>>>>>>>> FROM batch AS a
> >>>>>>>>>> WHERE NOT EXISTS
> >>>>>>>>>> (
> >>>>>>>>>>    SELECT Name
> >>>>>>>>>>    FROM Filename AS f
> >>>>>>>>>>    WHERE f.Name = a.Name
> >>>>>>>>>> )
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> => Because the first one removes the duplicate rows from batch 
> >>>>>>>>>> table
> >>>>>>>>>> before checking they exist in the filename table, while the second
> >>>>>>>>>> one checks after. Performance was better in benchmarks with the 
> >>>>>>>>>> first
> >>>>>>>>>> one (with PostgreSQL, with our data).
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> * Second question :
> >>>>>>>>>> Why not :
> >>>>>>>>>>
> >>>>>>>>>> INSERT INTO filename(name)
> >>>>>>>>>> SELECT DISTINCT name
> >>>>>>>>>> FROM batch AS a
> >>>>>>>>>> LEFT JOIN filename AS f USING (name)
> >>>>>>>>>> WHERE f.name IS NULL
> >>>>>>>>>>
> >>>>>>>>>> In fact, if the engine choses the right execution path, the 
> >>>>>>>>>> execution
> >>>>>>>>>> plan will be the roughly the same (nested loop) as
> >>>>>>>>>>
> >>>>>>>>>> INSERT INTO Filename(Name)
> >>>>>>>>>> SELECT DISTINCT Name
> >>>>>>>>>> FROM batch AS a
> >>>>>>>>>> WHERE NOT EXISTS
> >>>>>>>>>> (
> >>>>>>>>>> SELECT Name
> >>>>>>>>>> FROM Filename AS f
> >>>>>>>>>> WHERE f.Name = a.Name
> >>>>>>>>>> )
> >>>>>>>>>>
> >>>>>>>>>> Except if batch is close in size to filename (for instance on my
> >>>>>>>>>> bacula database, it would be a 50 million files backup). Anyway, 
> >>>>>>>>>> If i
> >>>>>>>>>> recall correctly about mysql (and it hasn't changed recently), 
> >>>>>>>>>> mysql
> >>>>>>>>>> will do a nested loop for both, as it doesn't know how to merge or
> >>>>>>>>>> hash join ...
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> But I agree that this second syntax theorically gives more 
> >>>>>>>>>> lattitude
> >>>>>>>>>> to the engine. Except that recent oracle and postgresql versions 
> >>>>>>>>>> will
> >>>>>>>>>> anyway be able to convert between the exists/not exists and the 
> >>>>>>>>>> join
> >>>>>>>>>> (or IN) in this case, as they are semantically equivalent.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Now, I don't know how mysql would react to all these queries. An
> >>>>>>>>>> explain plan would be great to understand... but I would have 
> >>>>>>>>>> thought
> >>>>>>>>>> that the NOT EXIST query would have been great for it's optimizer, 
> >>>>>>>>>> as
> >>>>>>>>>> it tells mysql exactly what to do :)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> A last note, for postgreSQL users on this mailing list :
> >>>>>>>>>> You can try to get better performance with batch inserts by 
> >>>>>>>>>> lowering
> >>>>>>>>>> seq_page_cost and random_page_cost, so that the engine favours 
> >>>>>>>>>> nested
> >>>>>>>>>> loop instead of hash join for the INSERT INTO File query : most of
> >>>>>>>>>> the time, most of filename and path will be in the database or
> >>>>>>>>>> filesystem cache, and the plan will be much faster (and save you 
> >>>>>>>>>> big
> >>>>>>>>>> sorts)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Please tell me if you need more information.
> >>>>>>>>>>
> >>
> 
> > 
> 
> 
> -- 
> 
>      Bruno Friedmann
> 
> 
> 
> ------------------------------------------------------------------------------
> Are you an open source citizen? Join us for the Open Source Bridge conference!
> Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
> Need another reason to go? 24-hour hacker lounge. Register today!
> http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
> _______________________________________________
> Bacula-devel mailing list
> Bacula-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-devel

-- 
----------------------------------------------------------------------------
Jason A. Kates (ja...@kates.org) 
Fax:    208-975-1514
Phone:  212-400-1670 x2
============================================================================



------------------------------------------------------------------------------
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to