Re: [Bacula-devel] Query changes in the catalog browser and indexes

Kern Sibbald Sat, 25 Aug 2007 00:04:48 -0700

Hello Marc,

Now that we have a number of "browsers" and users are doing a lot more 
queries, it might be a good idea to consider re-organizing a few of the 
Bacula tables to improve the efficiency of the queries (and also perhaps 
backups and restores).

Please see more below...

On Friday 24 August 2007 19:47, Marc Cousin wrote:
...
> >
> > Thank you very much for your willingness to take a look at the indexes I
> > am proposing.
>
> I still have the same issues I discussed with you a few days ago : even
> with your new index, everything will look good as long as the SELECT
> DISTINCT returns only a very small amount of data (a few hundred to a few
> thousand records). It will be unbearable from a GUI point of view if you
> have to retrieve hundreds of thousands of directories at once (or even
> millions of directories, as the joblist takes the last 20 jobs, even if
> they are full jobs).
> So here we are talking hundreds of thousands of index scans/seeks for the
> database (because the dir entries are completely mixed with file entries in
> the File table, you probably won't have more than one/a few dirs per
> database page).
> My point is, if you really want to make it blazing fast (or at least
> reasonably fast), there are, I think, very few methods :
> - The dir entries should not be drowned in the bulk of file entries (ie you
> need a new table to store them and only them, or sort physically the table
> on disk so that directories are all stored together on a few contiguous
> pages...)

I haven't been following this as carefully as I would normally, because of 
my "mini-vacation", but I am interested in the basis of the problem as far as 
the Bacula table structure.  I ask that because the Filenames and the Paths 
are already separated into two tables, and that seems to be what you are 
proposing.  Can you be a bit more explicit?  Is it that the File entries 
should be split into two separate tables -- one containing a link to Paths 
only and one containing a link to Path/Filenames?  If so, is that something 
that you would propose for Bacula in general?

> - Even better, if you could only retrieve as few entries as possible each
> time, it would be even better. It means being able to retrieve only the
> subdirectories of the current directory. This isn't feasible with the
> current schema either, as you need a way to link directories with
> subdirectories (bi-directonnaly if possible).
>

I am also interested in understanding the tables that you and Eric use in 
brestore to speed up retrieving multiple versions of a file.  Assuming it is 
a good idea to split the current "File" table into "Dirs" and "Files", do you 
think it would be a good idea to have the Bacula core build and maintain any 
other tables for fast lookups?

Another possible table would be a list of the "top" level Directories for each 
Job.  That might significantly speed up retrieving the base directory 
structure. 

Best regards,

Kern

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Query changes in the catalog browser and indexes

Reply via email to