Re: [Bacula-devel] Query changes in the catalog browser and indexes

Dirk Bartley Sun, 19 Aug 2007 18:47:29 -0700

On Sun, 2007-08-19 at 13:06 -0400, Dan Langille wrote:
> On 19 Aug 2007 at 12:35, Dirk Bartley wrote:
> 
> > There is one alternative index to the above.  The index of the two
> > colums (FilenameId, JobId) is really only needed for the condition of
> > FilenameId = (SELECT FilenameId FROM Filename WHERE name='').  A
> > subquery can't be used when creating an index, but running one command
> > and then substituting does work.  To create a partial index in postgres:
> > CREATE INDEX file_filenameid_jobid2 ON File (JobId) WHERE FilenameId=5;
> > This could not be done until after a record exists in the filename table
> > that fits the condition.
> 
> The usefulness of that index can be determined by the queries that 
> use it.  I would expect that two indexes, one of filenameid and 
> another on jobid would work just as well.  But that too must be 
> tested.  On PostgreSQL, the "explain analyse" output will show what 
> is best.
> 
> I can help with the query tuning on PostgreSQL.


I use postgresql for my work server and the last week of testing has
been my experience using EXPLAIN in psql.  I've used EXPLAIN abaout 60
times or so attempting to figure out optimizing indexes.  Any testing
and explantion of an optimization above what I have done would certainly
help me and educate me.

I may be wrong but I thoght I was suppsed to do

ANALYZE file;

EXPLAIN .. query ..;

After adding an index.    Is that what I'm supposed to do according to
your recolection.  I kept using "explain" to determine if it would do a
sequential scan or use the indexes as hoped.



There are basically three queries.  One to get a list of directories,
another to get a list of files and one to get a list of versions for a
specific file.

1. To get a list of directories given a list of jobs.  Here is an
example:  

SELECT DISTINCT Path.Path AS Path, File.PathId AS PathId FROM File INNER
JOIN Path ON (File.PathId=Path.PathId) WHERE File.FilenameId=3 AND
File.Jobid IN
(441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,416,415,413,412,411,410,409,408,407,406,405,404,403,402,400,399,398,397,396,395,394,393,392,391,385,384,383,382,381,379,378,376,375,374,373,372,371)
 ORDER BY Path
Time: 3988.458 ms

Query the file table for all records that have a filenameid where
name='', the filenameid value is queried for earlier.  If the filenameid
where filename='' is retrieved as a subquery, the analyze does not do as
well, especially with a partial index.  Therfore I have chosen an index
which is either on (filenameid and jobid) or (a partial index on jobid
where filenameid=3).  Either one of these seems to perform considerably
better than separate indexe on filenameid and jobid.

I believe one of these performs better due to the file table containing
a very large number of records.
bacula=# SELECT COUNT(Fileid) FROM File;
  count
----------
 14049283
(1 row)
bacula=# SELECT COUNT(Fileid) FROM File Where FilenameId=5;
  count
---------
 1834962
(1 row)
The analyze shows the query geting a list of the directories for a
certain specified list of jobs, we are only interested in the records
where pathid=5 (or the pathid for pathname='').  Then if this set of
records is indexed on the jobs, the amount of time the query takes is
mostly determined by the number of

So out of 14 million records in the file table, 1.8 million are for
directories.  The record count to process is reduced in the query from
14 million to 1.8 million by using a multi columned index.

If both indexes are available to explain, it will use the partial index.



2. get a list of filenames in a certain path for a list of jobs.
Example:

SELECT DISTINCT Filename.Name AS FileName, Filename.FilenameId AS
FilenameId FROM File  INNER JOIN Filename on
(Filename.FilenameId=File.FilenameId) WHERE File.PathId=27604 AND
File.Jobid IN
(2598,2584,2570,2556,2537,2519,2503,2488,2473,2458,2440,2425,2410,2395,2380,2363,2348,2333,2318,2303,2286,2271)
 AND Filename.Name!='' ORDER BY FileName

This one is easy.  EXPLAIN says it will use the index in Pathid and
performs it in Time: 4.383 ms.  This index does not exist in the install
script.

3. get a list of versions of a file.  Example:

SELECT Job.JobId AS JobId, Job.Level AS Type, Job.EndTime AS EndTime,
File.Md5 AS MD5, File.FileId AS FileId FROM File INNER JOIN Filename on
(Filename.FilenameId=File.FilenameId) INNER JOIN Path ON
(Path.PathId=File.PathId) INNER JOIN Job ON (File.JobId=Job.JobId) WHERE
Path.PathId=27604 AND Filename.FilenameId=1049929 AND Job.Jobid IN
(2598,2584,2570,2556,2537,2519,2503,2488,2473,2458,2440,2425,2410,2395,2380,2363,2348,2333,2318,2303,2286,2271)
 ORDER BY Job.EndTime DESC

This one is also easy, EXPLAIN says it will use the index on
(filenameid, jobid), which already exists in the install script.
Time: 42.523 ms

So if anyone has any other thoughts about indexes, I sure would be open
to looking them.

> 
> > I'm looking for a little feedback.  If there are options that anyone has
> > about the queries and indexes, I sure would like to know about them.
> > Should indexes for bat be in the install script for the databases??  Is
> > advising about possible indexes to increase performance in the help file
> > something I should do.
> 
> Yes, indexes for bat should be included.

In the install script, in the bat help or somewhere else??

Dirk

signature.asc
Description: This is a digitally signed message part

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Query changes in the catalog browser and indexes

Reply via email to