Le Tuesday 09 September 2008 23:15:45 Bill Moran, vous avez écrit : > In response to "Yuri Timofeev" <[EMAIL PROTECTED]>: > > 2008/9/9 Bill Moran <[EMAIL PROTECTED]>: > > > In response to Bob Hetzel <[EMAIL PROTECTED]>: > > >. Additionally, doing that > > > rewrite so that it results in improvements to all SQL platforms is > > > probably going to require that the code be broken into SQL-dependent > > > and SQL-independent sections. > > > > It may be that in the critical sections have to do it. > > Agreed. This was demonstrated last year by some massive speed improvements > in the way that job records are saved. > > > The problem is that different DBMS have different optimizers. > > For example, in PostgreSQL (during my tests) operator EXPLAIN does not > > show that will be used indices : > > > > bacula=# EXPLAIN SELECT File.FileId, Job.JobId FROM File LEFT OUTER JOIN > > Job ON (File.JobId = Job.JobId) WHERE Job.JobId IS NULL LIMIT 300000; > > > > QUERY PLAN > > ------------------------------------------------------------------------- > >-- Limit (cost=78.86..24169.76 rows=300000 width=8) > > -> Hash Left Join (cost=78.86..181966.90 rows=2265021 width=8) > > Hash Cond: (file.jobid = job.jobid) > > Filter: (job.jobid IS NULL) > > -> Seq Scan on file (cost=0.00..113937.42 rows=4530042 > > width=8) -> Hash (cost=58.38..58.38 rows=1638 width=4) > > -> Seq Scan on job (cost=0.00..58.38 rows=1638 width=4) > > (7 rows) > > > > However, PostgreSQL worked quickly. > > That's because using indexes on that query is a horrifically bad idea: > > bacula=# EXPLAIN SELECT File.FileId, Job.JobId FROM File LEFT OUTER JOIN > Job ON bacula-# (File.JobId = Job.JobId) WHERE Job.JobId IS NULL LIMIT > 300000; QUERY PLAN > --------------------------------------------------------------------------- >-- Limit (cost=53.15..14650.94 rows=300000 width=8) > -> Hash Left Join (cost=53.15..2115902.45 rows=43482932 width=8) > Hash Cond: ("outer".jobid = "inner".jobid) > Filter: ("inner".jobid IS NULL) > -> Seq Scan on file (cost=0.00..1463605.32 rows=43482932 > width=8) -> Hash (cost=50.52..50.52 rows=1052 width=4) > -> Seq Scan on job (cost=0.00..50.52 rows=1052 width=4) > (7 rows) > > bacula=# set enable_seqscan=off; > SET > Time: 0.210 ms > bacula=# EXPLAIN SELECT File.FileId, Job.JobId FROM File LEFT OUTER JOIN > Job ON bacula-# (File.JobId = Job.JobId) WHERE Job.JobId IS NULL LIMIT > 300000; QUERY PLAN > --------------------------------------------------------------------------- >--------------------------- Limit (cost=0.00..1042544.60 rows=300000 > width=8) > -> Merge Right Join (cost=0.00..151109653.30 rows=43482932 width=8) > Merge Cond: ("outer".jobid = "inner".jobid) > Filter: ("outer".jobid IS NULL) > -> Index Scan using job_pkey on job (cost=0.00..175.80 rows=1052 > width=4) -> Index Scan using file_jobid_idx on file > (cost=0.00..150565938.23 rows=43482932 width=8) (6 rows) > > Notice that PostgreSQL expects the index scan on file_job_idx to take > 150,565,938 operations along, whereas the entire sequential scan only > takes 1,463,605. The seq scans are a clear winner in this case. > > > MySQL (theoretically) should use an index (multiple columns), which > > already have: > > in make_mysql_tables.in > > CREATE TABLE File (... > > INDEX (JobId, PathId, FilenameId) > > > > but MySQL worked slowly. > > I don't know much about the innards of MySQL's query planner, but it > doesn't seem to be as robust as Postgres'. Index scanning is usually > the best way to go, but (as is shown above) not always. I have trouble > believing that the MySQL developers always use available indexes as > that would create a laundry list of corner cases where queries perform > horribly, but it seems that their planning logic fails in this particular > case. > > It's not unusual. If you mess with Postgres long enough, you'll find > odd queries that perform poorly even though there's a much faster way > to do it. And that's the rub. If I were writing a system purely for > PostgreSQL, I could reorder joins or make other tweaks to the query > until it ran quickly, but then I might have a query that ran like > molasses on MySQL. The inverse is true as well. > > My point is that such work is definitely warranted in the director, I'm > not sure if it's warranted in something like dbcheck. If you've already > got it done, there's no reason to waste it, but suggesting that someone > review dbcheck and optimize the SQL is like asking me to clean toilets > with a toothbrush -- it's a useful exercise to teach privates not to > mouth off to sergeants, but it's not a particularly efficient way to get > the toilet clean. > > I expect, that on Postgres, the fastest way to delete orphaned records > would be: > delete from filename > where filename.filenameid not in (select filenameid from file); > > Assuming that is fast, would such a query even work on MySQL?
We (essentially Marc Cousin) have already done some optimizations for postgresql, you will be able to find a dbcheck version in pure SQL in trunk/bacula/examples/database/dbcheck.sql It's *pretty* fast with postgresql because the engine is able to do merge or hash join. Mysql uses nested loop (a loop within a loop) instead and needs indexes. Bye ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel