On Thursday 30 August 2007 20:44, Marc Cousin wrote: > > > If we separate the current File data into Dirs and Files, first, we > > > have reduced the amount of data that we need to look through to find > > > the directory tree for a job by about a factor of 100 on most typical > > > Unix systems. That is already good. Then, in the Dirs table we can > > > have the following columns: > > > > > > JobId > > > ParentId > > > PathId > > > > > > If ParentId is NULL (or perhaps zero), we know it is a top level > > > directory. That is one directly mentioned in a FileSet. Otherwise, it > > > is nested down. So for a given JobId we can quickly find all the top > > > level directories and parse them any way we want. > > > > So it means that the client has to find all the root directories of the > > backup, then calculate its parent directories if required, to display > > them. Is the root directory the one setup in the fileset ? > > Is there no risk of missing some 'root directories' from incremental > > backups, where the real root directory has not been modified ? (I > > honestly don't know, I'm asking...) > > BTW, Avoid NULL at all costs if you want to be able to use the index to > > retrieve your records : null values aren't indexed. > > I've been thinking about it again. What I don't like about chaining Ids in > the Dirs table, is that it makes some Dir records linked to dirs of another > jobid (in case of incremental backups). And when we do the incremental, we > may even not know to which parentid link, as there will be several of them > available.
I had never intended to link records from one JobId to another, which IMO is a bad idea, so you can just dismiss what I previously wrote as being incomplete. > > Maybe it would be easier to add a parentid in the Path table. Of course it > means we don't restrain links to the ones that should be displayed for a > given server... But this is then easily filtered matching data from the Dir > or File table and saves a lot of space. Of course, it defeats the purpose > of having an easy way to recognise 'root' directories, as the info isn't > there anymore... Maybe then this info should be stored in another place ? > Something like having more metadata in the job table (or another table > describing all the root directories associated with a peculiar job, or > anythink of this sort, I really don't know) > > Having a table/set of tables describing precisely how a backup was done may > be very interesting compared to storing a big amount of useless data in > these tables : it seems better paying a fixed amount on saving full > metadata of each backup than waste 4 bytes per dir to save the parentid of > every dir we back up ? > > > > In the Files table, in *addition* to the existing columns (Path and > > > Filename), if we need it we can have a DirId, which points to the Dirs > > > record for the given Path and Filename. > > > > If we have the dirid, we don't need the pathid anymore, I guess, as it > > would be in the dir table. > > Then comes another doubt :) > What happens if the dirid isn't there anymore ? (we have made an > incremental backup of a file, and the full it refers to doesn't exist > anymore) Well, each level needs to be independent as it is today. It is impossible to do a complete current backup if the Full is not there, but you can always restore any of the files in an Inc backup without the Full. This must continue to be the case after any changes we make. > > > > To do the above, we > > > 1. Split the FIle table into Dirs and Files > > > 2. Add one new column to Files, which is DirId (if necessary) > > > 3. Delete the FilenameId from the Dirs record (i.e. it is identical to > > > the current File record less the FilenameId column). > > > 4. Add one new ParentId column to the Dirs table. > > > > Here I've got a question : will you calculate the ParentId at insert time > > ? (we must avoid updates, it has a big performance impact on all > > transactional SGBDs). I really don't know how much it will cost, but it > > may slow down database insertions by a big amount... The parentid dir may > > not even be in the same job... > > > > The point of brestore's method is to calculate as little of these links > > as possible, thanks to the hierarchy table : the links between dirs and > > their parents is not correlated with jobs in our case, so we do the links > > once for all the jobs, except in the case of a new directory. > > If we store parentpathid in the path table,it becomes much less costly... > but then again, we don't have the root directories information at hand > anymore. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
