Re: [Bacula-devel] Query changes in the catalog browser and indexes

Kern Sibbald Thu, 23 Aug 2007 13:37:56 -0700

On Thursday 23 August 2007 04:02, Dirk Bartley wrote:
> On Wed, 2007-08-22 at 10:11 -0400, Dan Langille wrote:
> > On 21 Aug 2007 at 20:07, Dirk Bartley wrote:
> > > On Mon, 2007-08-20 at 08:04 -0400, Dan Langille wrote:
> > > > > There are basically three queries.  One to get a list of
> > > > > directories, another to get a list of files and one to get a list
> > > > > of versions for a specific file.
> > > > >
> > > > > 1. To get a list of directories given a list of jobs.  Here is an
> > > > > example: =20
> > > > >
> > > > > SELECT DISTINCT Path.Path AS Path, File.PathId AS PathId FROM File
> > > > > INNER JOIN Path ON (File.PathId=3DPath.PathId) WHERE
> > > > > File.FilenameId=3D3 AND File.Jobid IN
> > > > > (441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,42
> > > > >5,424,42=
> > > > > 3,422,421,420,419,418,416,415,413,412,411,410,409,408,407,406,405,4
> > > > >04,403,4=
> > > > > 02,400,399,398,397,396,395,394,393,392,391,385,384,383,382,381,379,
> > > > >378,376,= 375,374,373,372,371) ORDER BY Path
> > > > > Time: 3988.458 ms
> > > >
> > > > That is four seconds.  I'd have to see the explain analyse output.
> > >
> > >  Unique  (cost=338755.18..340527.31 rows=6297 width=36) (actual
> > > time=4000.332..4000.437 rows=16 loops=1)
> > >    ->  Sort  (cost=338755.18..339345.89 rows=236283 width=36) (actual
> > > time=4000.330..4000.359 rows=204 loops=1)
> > >          Sort Key: path.path, file.pathid
> > >          ->  Hash Join  (cost=16509.58..307281.74 rows=236283 width=36)
> > > (actual time=3932.884..3994.564 rows=204 loops=1)
> > >                Hash Cond: (file.pathid = path.pathid)
> > >                ->  Index Scan using file_filenameid_jobid2 on file
> > > (cost=0.00..280519.50 rows=236283 width=4) (actual
> > > time=130.950..3289.826 rows=204 loops=1)
> > >                      Filter: (jobid = ANY
> > > ('{2629,2616,2598,2584,2570,2556,2537,2519,2503,2488,2473,2458,2440,242
> > >5,2410,2395,2380,2363,2348,2333,2318,2303}'::integer[])) ->  Hash 
> > > (cost=8713.59..8713.59 rows=366159 width=36) (actual
> > > time=453.745..453.745 rows=366951 loops=1)
> > >                      ->  Seq Scan on path  (cost=0.00..8713.59
> > > rows=366159 width=36) (actual time=0.021..179.985 rows=366951 loops=1)
> > >  Total runtime: 4000.694 ms
> > > (10 rows)
> > >
> > > This shows the database choosing to use the partial index when both
> > > exist.  My impression of the partial indexex advantages is that it is
> > > less expensive to maintain.  Maintenence of the index will only be
> > > triggered when a record in file is modified that contains the
> > > filenameid.  The disadvantage is that it could not be created until
> > > after the first job is done and the filenameid where name='' can be
> > > queried for.
> >
> > Partial indexes are usually only useful if what you are looking for
> > is a constant.
>
> My hope and current understanding is that the filenameid which is
> returned from the query
> SELECT FilenameId FROM Filename WHERE name=''
> is "relatively" constant and will not change until the last job that has
> a directory is purged from the database.


The FilenameId will not change. Even if the whole catalog is purged, it will 
remain unchanged.  To change it there would need to be no references to it 
from any File table entry AND you would need to run the appropriate clean 
orphaned record option in dbcheck to get it removed.

Regards,

Kern

>
> > I think you are proposing creating an index on-the-
> > fly.
>
> On the fly is a term that I would use if the index were being re-created
> on a regular basis, this index is not.
>
> >   I think this is not the way to go.  I think we can/should find
> > another way to improve this query's performance.
>
> If there is another way, that would be most excellent.  All ways  that I
> can imagine involve changes in the schema, like if there was a client
> column in the file table, an index could be put on that and the
> interface could be modified to not worry about jobs, only the client.
>
> > If you can put your database on a website somewhere, I'll download it
> > and try it out and work on the query that way.
>
> I will respectfully choose not to do that at this time.  I fear there
> may be reprisals from management if it were ever found out that I
> allowed corporate data which describes our entire filestructure
> available.  Not to mention using a gigs worth of our companies bandwidth
> and that we have no web server to place the file on.
>
> > With this, we'll both
> > be working with the same dataset, and I can show you how to make
> > things faster.
>
> I think we can do this with separate data as long as you have a
> reasonably large enough data set to be comparable.
>
> SELECT DISTINCT Path.Path AS Path, File.PathId AS PathId FROM File INNER
> JOIN Path ON (File.PathId=Path.PathId) WHERE File.FilenameId=YVAL AND
> File.Jobid IN (XLIST) ORDER BY Path
>
> just use the above and replace YVAL with the result of:
> SELECT FilenameId FROM Filename WHERE name=''
>
> and replace XLIST with the result of:
> SELECT Job.Jobid AS Id FROM Job INNER JOIN Client ON
> (Job.ClientId=Client.ClientId) INNER JOIN FileSet ON
> (Job.FileSetId=FileSet.FileSetId) WHERE Client.Name='srvalum1-fd' ORDER
> BY Job.EndTime DESC LIMIT 20
>
> of course replace srvalum1-fd with your favorite client.
>
> Thank you very much for your willingness to take a look at the indexes I
> am proposing.
>
> Dirk
>
>
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> Bacula-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/bacula-devel

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Query changes in the catalog browser and indexes

Reply via email to