> > If we separate the current File data into Dirs and Files, first, we have > > reduced the amount of data that we need to look through to find the > > directory tree for a job by about a factor of 100 on most typical Unix > > systems. That is already good. Then, in the Dirs table we can have the > > following columns: > > > > JobId > > ParentId > > PathId > > > > If ParentId is NULL (or perhaps zero), we know it is a top level > > directory. That is one directly mentioned in a FileSet. Otherwise, it is > > nested down. So for a given JobId we can quickly find all the top level > > directories and parse them any way we want. > > So it means that the client has to find all the root directories of the > backup, then calculate its parent directories if required, to display them. > Is the root directory the one setup in the fileset ? > Is there no risk of missing some 'root directories' from incremental > backups, where the real root directory has not been modified ? (I honestly > don't know, I'm asking...) > BTW, Avoid NULL at all costs if you want to be able to use the index to > retrieve your records : null values aren't indexed.
I've been thinking about it again. What I don't like about chaining Ids in the Dirs table, is that it makes some Dir records linked to dirs of another jobid (in case of incremental backups). And when we do the incremental, we may even not know to which parentid link, as there will be several of them available. Maybe it would be easier to add a parentid in the Path table. Of course it means we don't restrain links to the ones that should be displayed for a given server... But this is then easily filtered matching data from the Dir or File table and saves a lot of space. Of course, it defeats the purpose of having an easy way to recognise 'root' directories, as the info isn't there anymore... Maybe then this info should be stored in another place ? Something like having more metadata in the job table (or another table describing all the root directories associated with a peculiar job, or anythink of this sort, I really don't know) Having a table/set of tables describing precisely how a backup was done may be very interesting compared to storing a big amount of useless data in these tables : it seems better paying a fixed amount on saving full metadata of each backup than waste 4 bytes per dir to save the parentid of every dir we back up ? > > > In the Files table, in *addition* to the existing columns (Path and > > Filename), if we need it we can have a DirId, which points to the Dirs > > record for the given Path and Filename. > > If we have the dirid, we don't need the pathid anymore, I guess, as it > would be in the dir table. Then comes another doubt :) What happens if the dirid isn't there anymore ? (we have made an incremental backup of a file, and the full it refers to doesn't exist anymore) > > > To do the above, we > > 1. Split the FIle table into Dirs and Files > > 2. Add one new column to Files, which is DirId (if necessary) > > 3. Delete the FilenameId from the Dirs record (i.e. it is identical to > > the current File record less the FilenameId column). > > 4. Add one new ParentId column to the Dirs table. > > Here I've got a question : will you calculate the ParentId at insert time ? > (we must avoid updates, it has a big performance impact on all > transactional SGBDs). I really don't know how much it will cost, but it may > slow down database insertions by a big amount... The parentid dir may not > even be in the same job... > > The point of brestore's method is to calculate as little of these links as > possible, thanks to the hierarchy table : the links between dirs and their > parents is not correlated with jobs in our case, so we do the links once > for all the jobs, except in the case of a new directory. If we store parentpathid in the path table,it becomes much less costly... but then again, we don't have the root directories information at hand anymore. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
