On 21 Aug 2007 at 20:07, Dirk Bartley wrote:

> On Mon, 2007-08-20 at 08:04 -0400, Dan Langille wrote:
> > > 
> > > There are basically three queries.  One to get a list of directories,
> > > another to get a list of files and one to get a list of versions for a
> > > specific file.
> > > 
> > > 1. To get a list of directories given a list of jobs.  Here is an
> > > example: =20
> > > 
> > > SELECT DISTINCT Path.Path AS Path, File.PathId AS PathId FROM File INNER
> > > JOIN Path ON (File.PathId=3DPath.PathId) WHERE File.FilenameId=3D3 AND
> > > File.Jobid IN
> > > (441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,42=
> > > 3,422,421,420,419,418,416,415,413,412,411,410,409,408,407,406,405,404,403,4=
> > > 02,400,399,398,397,396,395,394,393,392,391,385,384,383,382,381,379,378,376,=
> > > 375,374,373,372,371) ORDER BY Path
> > > Time: 3988.458 ms
> > 
> > That is four seconds.  I'd have to see the explain analyse output.
> 
>  Unique  (cost=338755.18..340527.31 rows=6297 width=36) (actual
> time=4000.332..4000.437 rows=16 loops=1)
>    ->  Sort  (cost=338755.18..339345.89 rows=236283 width=36) (actual
> time=4000.330..4000.359 rows=204 loops=1)
>          Sort Key: path.path, file.pathid
>          ->  Hash Join  (cost=16509.58..307281.74 rows=236283 width=36)
> (actual time=3932.884..3994.564 rows=204 loops=1)
>                Hash Cond: (file.pathid = path.pathid)
>                ->  Index Scan using file_filenameid_jobid2 on file
> (cost=0.00..280519.50 rows=236283 width=4) (actual
> time=130.950..3289.826 rows=204 loops=1)
>                      Filter: (jobid = ANY
> ('{2629,2616,2598,2584,2570,2556,2537,2519,2503,2488,2473,2458,2440,2425,2410,2395,2380,2363,2348,2333,2318,2303}'::integer[]))
>                ->  Hash  (cost=8713.59..8713.59 rows=366159 width=36)
> (actual time=453.745..453.745 rows=366951 loops=1)
>                      ->  Seq Scan on path  (cost=0.00..8713.59
> rows=366159 width=36) (actual time=0.021..179.985 rows=366951 loops=1)
>  Total runtime: 4000.694 ms
> (10 rows)
> 
> This shows the database choosing to use the partial index when both
> exist.  My impression of the partial indexex advantages is that it is
> less expensive to maintain.  Maintenence of the index will only be
> triggered when a record in file is modified that contains the
> filenameid.  The disadvantage is that it could not be created until
> after the first job is done and the filenameid where name='' can be
> queried for.

Partial indexes are usually only useful if what you are looking for 
is a constant.  I think you are proposing creating an index on-the-
fly.  I think this is not the way to go.  I think we can/should find 
another way to improve this query's performance.

If you can put your database on a website somewhere, I'll download it 
and try it out and work on the query that way.  With this, we'll both 
be working with the same dataset, and I can show you how to make 
things faster.

-- 
Dan Langille - http://www.langille.org/
Available for hire: http://www.freebsddiary.org/dan_langille.php



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to