On 21 Aug 2007 at 20:07, Dirk Bartley wrote:
> On Mon, 2007-08-20 at 08:04 -0400, Dan Langille wrote:
> > >
> > > There are basically three queries. One to get a list of directories,
> > > another to get a list of files and one to get a list of versions for a
> > > specific file.
> > >
> > > 1. To get a list of directories given a list of jobs. Here is an
> > > example: =20
> > >
> > > SELECT DISTINCT Path.Path AS Path, File.PathId AS PathId FROM File INNER
> > > JOIN Path ON (File.PathId=3DPath.PathId) WHERE File.FilenameId=3D3 AND
> > > File.Jobid IN
> > > (441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,42=
> > > 3,422,421,420,419,418,416,415,413,412,411,410,409,408,407,406,405,404,403,4=
> > > 02,400,399,398,397,396,395,394,393,392,391,385,384,383,382,381,379,378,376,=
> > > 375,374,373,372,371) ORDER BY Path
> > > Time: 3988.458 ms
> >
> > That is four seconds. I'd have to see the explain analyse output.
>
> Unique (cost=338755.18..340527.31 rows=6297 width=36) (actual
> time=4000.332..4000.437 rows=16 loops=1)
> -> Sort (cost=338755.18..339345.89 rows=236283 width=36) (actual
> time=4000.330..4000.359 rows=204 loops=1)
> Sort Key: path.path, file.pathid
> -> Hash Join (cost=16509.58..307281.74 rows=236283 width=36)
> (actual time=3932.884..3994.564 rows=204 loops=1)
> Hash Cond: (file.pathid = path.pathid)
> -> Index Scan using file_filenameid_jobid2 on file
> (cost=0.00..280519.50 rows=236283 width=4) (actual
> time=130.950..3289.826 rows=204 loops=1)
> Filter: (jobid = ANY
> ('{2629,2616,2598,2584,2570,2556,2537,2519,2503,2488,2473,2458,2440,2425,2410,2395,2380,2363,2348,2333,2318,2303}'::integer[]))
> -> Hash (cost=8713.59..8713.59 rows=366159 width=36)
> (actual time=453.745..453.745 rows=366951 loops=1)
> -> Seq Scan on path (cost=0.00..8713.59
> rows=366159 width=36) (actual time=0.021..179.985 rows=366951 loops=1)
> Total runtime: 4000.694 ms
> (10 rows)
>
> This shows the database choosing to use the partial index when both
> exist. My impression of the partial indexex advantages is that it is
> less expensive to maintain. Maintenence of the index will only be
> triggered when a record in file is modified that contains the
> filenameid. The disadvantage is that it could not be created until
> after the first job is done and the filenameid where name='' can be
> queried for.
Partial indexes are usually only useful if what you are looking for
is a constant. I think you are proposing creating an index on-the-
fly. I think this is not the way to go. I think we can/should find
another way to improve this query's performance.
If you can put your database on a website somewhere, I'll download it
and try it out and work on the query that way. With this, we'll both
be working with the same dataset, and I can show you how to make
things faster.
--
Dan Langille - http://www.langille.org/
Available for hire: http://www.freebsddiary.org/dan_langille.php
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel