Re: [Bacula-devel] Query changes in the catalog browser and indexes

Kern Sibbald Wed, 22 Aug 2007 00:31:52 -0700

On Sunday 19 August 2007 18:35, Dirk Bartley wrote:
> Hello
>
> Now that 2.2.0 is released, I've had the opportunity to use bat in my
> work environment.  It worked generally as hoped with one major
> exception.  The catalog browser. slow slow slow.


Yes, in some cases, this is my experience too.

>
> Problem #1.
> I never set up my director to do pruning and my postgres database dumps
> had become 8Gb.  So administration dopiness was a big issue.  I've set
> up pruning, and now the size is slightly over 1Gb.  So one question I
> have is .. .. is this a typical size for a database dump which is mostly
> the file table??

It all depends on how many backups you are doing and your retention periods.  
There is a section of the manual that describes how to estimate the database 
size, but it does not take into account index sizes which can be very large.  
If the database size is not too big for you, it is probably OK.  

Since it shrunk a *lot* after pruning, I personally would copy it to another 
directory, write an ASCII output of the database, then re-initialize the 
tables and import the ASCII output.  That will ensure that you start with a 
non-fragmented database.

>
> Problem #2.
> A little inexperience with writing sql queries.  Eric Bollenger made me
> aware that the analyzer will make different decisions on how to use
> indexes based on whether a query uses a relation equality or an integer
> id equality, such as the difference in the following two lines:
>
> WHERE Filename.Name='" + file + "' AND Path.Path='" + directory + "'"
> WHERE File.FilenameId= + file + " AND File.PathId = " + directory

Yes, the first will be quite slow because the text fields are not all indexed 
and even if they are it is much less efficient to search for a text field 
than an Integer number.

>
> So I've made modifications to optimize this where possible.
>
> Problem #3
> Indexes specific for the queries the catlog browser does.  The decision
> to use indexes is always important.  Weighing the overhead of
> maintaining an index vs the benefits of using the index.  I have chosen
> to add the following indexes to the file table on my servers.  After
> alot of thinking and experimenting, they seem to be the best at
> optimizing the queries that are done.
>
> Mysql
> CREATE INDEX file_filenameid_jobid ON File (FilenameId, JobId);
> CREATE INDEX file_pathid_idx ON File (PathId);
>
> Postgres
> CREATE INDEX file_filenameid_jobid ON file USING btree (filenameid,
> jobid)
> CREATE INDEX file_pathid_idx ON file USING btree (pathid)
>
> Because these indexes are purely to optimize the catalog browser, I'm
> not sure I want to advocate for the inclusion of them in install
> procedures.  I think it would be best to add directions to add these
> indexes in the bat help files.

I think that the default Catalog creation should include Indexes that more or 
less optimize:

1. Backup
2. Restores

After that for doing browsing and the such, I believe that we should make the 
indexes optional.  The first step would be careful documentation, which does 
not exist today.

The second would be to have some scripts that can add the indexes for each of 
the 4 database engines we support SQLite, SQLite3, MySQL, PostgreSQL.  Those 
indexes should be added in a new and separate script.  However, it might be a 
good idea to call the script from the main table creation scripts.  I could 
imagine two or three different levels of index creation (simple, faster, 
lightning) or something like that that would be based on a command line 
option or an environment variable that the user could set, and depending on 
his choice, he would get more or less indexes.  If the indexes were carefully 
documented, the user could even manually create the ones that interest him 
the most.

>
> There is one alternative index to the above.  The index of the two
> colums (FilenameId, JobId) is really only needed for the condition of
> FilenameId = (SELECT FilenameId FROM Filename WHERE name='').  A
> subquery can't be used when creating an index, but running one command
> and then substituting does work.  To create a partial index in postgres:
> CREATE INDEX file_filenameid_jobid2 ON File (JobId) WHERE FilenameId=5;
> This could not be done until after a record exists in the filename table
> that fits the condition.

Unfortunately multiple indexing of columns doesn't always work equally well in 
all the different database engines.

>
> I'm looking for a little feedback.  If there are options that anyone has
> about the queries and indexes, I sure would like to know about them.
> Should indexes for bat be in the install script for the databases??  Is
> advising about possible indexes to increase performance in the help file
> something I should do.

The main things are probably:
1. Careful testing and identifying which indexes help which commands.
2. Documentation
3. Provide the user with some choices based on his own requirements and 
situation.
4. I don't recommend changing the default indexes without careful testing and 
discussion on the email lists.

Regards,

Kern

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Query changes in the catalog browser and indexes

Reply via email to