On Sun, 2007-08-19 at 13:06 -0400, Dan Langille wrote: > On 19 Aug 2007 at 12:35, Dirk Bartley wrote: > > > There is one alternative index to the above. The index of the two > > colums (FilenameId, JobId) is really only needed for the condition of > > FilenameId = (SELECT FilenameId FROM Filename WHERE name=''). A > > subquery can't be used when creating an index, but running one command > > and then substituting does work. To create a partial index in postgres: > > CREATE INDEX file_filenameid_jobid2 ON File (JobId) WHERE FilenameId=5; > > This could not be done until after a record exists in the filename table > > that fits the condition. > > The usefulness of that index can be determined by the queries that > use it. I would expect that two indexes, one of filenameid and > another on jobid would work just as well. But that too must be > tested. On PostgreSQL, the "explain analyse" output will show what > is best. > > I can help with the query tuning on PostgreSQL.
I use postgresql for my work server and the last week of testing has been my experience using EXPLAIN in psql. I've used EXPLAIN abaout 60 times or so attempting to figure out optimizing indexes. Any testing and explantion of an optimization above what I have done would certainly help me and educate me. I may be wrong but I thoght I was suppsed to do ANALYZE file; EXPLAIN .. query ..; After adding an index. Is that what I'm supposed to do according to your recolection. I kept using "explain" to determine if it would do a sequential scan or use the indexes as hoped. There are basically three queries. One to get a list of directories, another to get a list of files and one to get a list of versions for a specific file. 1. To get a list of directories given a list of jobs. Here is an example: SELECT DISTINCT Path.Path AS Path, File.PathId AS PathId FROM File INNER JOIN Path ON (File.PathId=Path.PathId) WHERE File.FilenameId=3 AND File.Jobid IN (441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,416,415,413,412,411,410,409,408,407,406,405,404,403,402,400,399,398,397,396,395,394,393,392,391,385,384,383,382,381,379,378,376,375,374,373,372,371) ORDER BY Path Time: 3988.458 ms Query the file table for all records that have a filenameid where name='', the filenameid value is queried for earlier. If the filenameid where filename='' is retrieved as a subquery, the analyze does not do as well, especially with a partial index. Therfore I have chosen an index which is either on (filenameid and jobid) or (a partial index on jobid where filenameid=3). Either one of these seems to perform considerably better than separate indexe on filenameid and jobid. I believe one of these performs better due to the file table containing a very large number of records. bacula=# SELECT COUNT(Fileid) FROM File; count ---------- 14049283 (1 row) bacula=# SELECT COUNT(Fileid) FROM File Where FilenameId=5; count --------- 1834962 (1 row) The analyze shows the query geting a list of the directories for a certain specified list of jobs, we are only interested in the records where pathid=5 (or the pathid for pathname=''). Then if this set of records is indexed on the jobs, the amount of time the query takes is mostly determined by the number of So out of 14 million records in the file table, 1.8 million are for directories. The record count to process is reduced in the query from 14 million to 1.8 million by using a multi columned index. If both indexes are available to explain, it will use the partial index. 2. get a list of filenames in a certain path for a list of jobs. Example: SELECT DISTINCT Filename.Name AS FileName, Filename.FilenameId AS FilenameId FROM File INNER JOIN Filename on (Filename.FilenameId=File.FilenameId) WHERE File.PathId=27604 AND File.Jobid IN (2598,2584,2570,2556,2537,2519,2503,2488,2473,2458,2440,2425,2410,2395,2380,2363,2348,2333,2318,2303,2286,2271) AND Filename.Name!='' ORDER BY FileName This one is easy. EXPLAIN says it will use the index in Pathid and performs it in Time: 4.383 ms. This index does not exist in the install script. 3. get a list of versions of a file. Example: SELECT Job.JobId AS JobId, Job.Level AS Type, Job.EndTime AS EndTime, File.Md5 AS MD5, File.FileId AS FileId FROM File INNER JOIN Filename on (Filename.FilenameId=File.FilenameId) INNER JOIN Path ON (Path.PathId=File.PathId) INNER JOIN Job ON (File.JobId=Job.JobId) WHERE Path.PathId=27604 AND Filename.FilenameId=1049929 AND Job.Jobid IN (2598,2584,2570,2556,2537,2519,2503,2488,2473,2458,2440,2425,2410,2395,2380,2363,2348,2333,2318,2303,2286,2271) ORDER BY Job.EndTime DESC This one is also easy, EXPLAIN says it will use the index on (filenameid, jobid), which already exists in the install script. Time: 42.523 ms So if anyone has any other thoughts about indexes, I sure would be open to looking them. > > > I'm looking for a little feedback. If there are options that anyone has > > about the queries and indexes, I sure would like to know about them. > > Should indexes for bat be in the install script for the databases?? Is > > advising about possible indexes to increase performance in the help file > > something I should do. > > Yes, indexes for bat should be included. In the install script, in the bat help or somewhere else?? Dirk
signature.asc
Description: This is a digitally signed message part
------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
