Hello Marc, Now that we have a number of "browsers" and users are doing a lot more queries, it might be a good idea to consider re-organizing a few of the Bacula tables to improve the efficiency of the queries (and also perhaps backups and restores).
Please see more below... On Friday 24 August 2007 19:47, Marc Cousin wrote: ... > > > > Thank you very much for your willingness to take a look at the indexes I > > am proposing. > > I still have the same issues I discussed with you a few days ago : even > with your new index, everything will look good as long as the SELECT > DISTINCT returns only a very small amount of data (a few hundred to a few > thousand records). It will be unbearable from a GUI point of view if you > have to retrieve hundreds of thousands of directories at once (or even > millions of directories, as the joblist takes the last 20 jobs, even if > they are full jobs). > So here we are talking hundreds of thousands of index scans/seeks for the > database (because the dir entries are completely mixed with file entries in > the File table, you probably won't have more than one/a few dirs per > database page). > My point is, if you really want to make it blazing fast (or at least > reasonably fast), there are, I think, very few methods : > - The dir entries should not be drowned in the bulk of file entries (ie you > need a new table to store them and only them, or sort physically the table > on disk so that directories are all stored together on a few contiguous > pages...) I haven't been following this as carefully as I would normally, because of my "mini-vacation", but I am interested in the basis of the problem as far as the Bacula table structure. I ask that because the Filenames and the Paths are already separated into two tables, and that seems to be what you are proposing. Can you be a bit more explicit? Is it that the File entries should be split into two separate tables -- one containing a link to Paths only and one containing a link to Path/Filenames? If so, is that something that you would propose for Bacula in general? > - Even better, if you could only retrieve as few entries as possible each > time, it would be even better. It means being able to retrieve only the > subdirectories of the current directory. This isn't feasible with the > current schema either, as you need a way to link directories with > subdirectories (bi-directonnaly if possible). > I am also interested in understanding the tables that you and Eric use in brestore to speed up retrieving multiple versions of a file. Assuming it is a good idea to split the current "File" table into "Dirs" and "Files", do you think it would be a good idea to have the Bacula core build and maintain any other tables for fast lookups? Another possible table would be a list of the "top" level Directories for each Job. That might significantly speed up retrieving the base directory structure. Best regards, Kern ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
