Le Tuesday 28 August 2007 16:12:56 Kern Sibbald, vous avez écrit : > On Tuesday 28 August 2007 13:30, Cousin Marc wrote: > > Le Tuesday 28 August 2007 11:57:28 Kern Sibbald, vous avez écrit : > > > On Tuesday 28 August 2007 10:41, BOLLENGIER Eric wrote: > > > > > > For the visibility flag, it may not be that easy : a directory > > > > > > may be visible even if it's not in a backup. For instance, /home > > > > > > if /home/marc is backuped should be displayed, so we add an entry > > > > > > from /home in pathvisibility for the job where /home/marc is > > > > > > backed up. > > > > > > > > > > Isn't the visibility rather easily deduced from the first path in > > > > > the backup? > > > > > > > > For example, this will not works with c:/ and d:/ under windows... > > > > or if your fileset contains /home/marc and /var/backups, you will > > > > display only the first, not /home and /var. > > > > > > Perhaps, but your FileSet *really* should be c:/home/marc ... If you > > > have left out the drive, it may work, but it is not the correct way. > > > Also, I think we can come up with some way to know that it is a Windows > > > directory tree, in which case, IMO /home/marc automatically means > > > c:/home/marc. Unless I am missing something, this is just an issue of > > > documentation, and perhaps a bit more rigor within Bacula to "force" > > > users to specify the drive. > > > > > > If your fileset contains /home/marc, it is rather obvious that the root > > > is / and that under that is home, then marc. That can be very quickly > > > figured out in the console program. > > > > It was just an example, and eric mixed two up I think. He started with a > > windows server, and ended up talking about a unix server (my previous > > example) > > > > Let's say you have a fileset with /home/marc, /home/eric/data > > and /var/tmp/marc. I agree that's a bit silly, but I think it would be a > > good example to discuss on. > > > > You can 'guess' that / is the main directory. But you still end up with > > the real problem, that is to determine as fast as possible what should be > > displayed, and I don't really see how you mean to do it. > > > > I think that we should start with the example and discuss on it... > > Can you explain how you know that you should display / as a root for this > > server, then that / contains home and var, then if you go in /home that > > you should display 'marc' and 'eric' ? > > Well, I'm not 100% sure I understand what you mean by visible -- probably > because I sort of intuitively understand what it is.
By visible, I just mean : 'should be displayed when JobId xxx is asked'. Knowing that most of the time, we have to display several jobids at the same time, of course... (The Full, latest differential and incrementals since then) > > So to get to the real question: "how do I know what to display as the > root?" > > If we separate the current File data into Dirs and Files, first, we have > reduced the amount of data that we need to look through to find the > directory tree for a job by about a factor of 100 on most typical Unix > systems. That is already good. Then, in the Dirs table we can have the > following columns: > > JobId > ParentId > PathId > > If ParentId is NULL (or perhaps zero), we know it is a top level directory. > That is one directly mentioned in a FileSet. Otherwise, it is nested down. > So for a given JobId we can quickly find all the top level directories and > parse them any way we want. So it means that the client has to find all the root directories of the backup, then calculate its parent directories if required, to display them. Is the root directory the one setup in the fileset ? Is there no risk of missing some 'root directories' from incremental backups, where the real root directory has not been modified ? (I honestly don't know, I'm asking...) BTW, Avoid NULL at all costs if you want to be able to use the index to retrieve your records : null values aren't indexed. > > In the Files table, in *addition* to the existing columns (Path and > Filename), if we need it we can have a DirId, which points to the Dirs > record for the given Path and Filename. If we have the dirid, we don't need the pathid anymore, I guess, as it would be in the dir table. > > To do the above, we > 1. Split the FIle table into Dirs and Files > 2. Add one new column to Files, which is DirId (if necessary) > 3. Delete the FilenameId from the Dirs record (i.e. it is identical to the > current File record less the FilenameId column). > 4. Add one new ParentId column to the Dirs table. Here I've got a question : will you calculate the ParentId at insert time ? (we must avoid updates, it has a big performance impact on all transactional SGBDs). I really don't know how much it will cost, but it may slow down database insertions by a big amount... The parentid dir may not even be in the same job... The point of brestore's method is to calculate as little of these links as possible, thanks to the hierarchy table : the links between dirs and their parents is not correlated with jobs in our case, so we do the links once for all the jobs, except in the case of a new directory. > > I think this gives us everything you currently have. Now, not having > really thought this through, I'm not sure how hard it will be to split the > File table and add the new fields. It *should* not be too hard. I would > imagine that we allow the normal insert process to generate any indexes > that we want. > > Taking your example with FileSet /home/marc, /home/eric/data, > and /var/tmp/marc, the entries in the Dirs table for those directories > would all have ParentId = NULL for that JobId. So we immediately know that > we have > > / as the root > then > /home > and > /var > ... > > Am I missing something? No, your way should work too, if we find a way to insert data as fast as possible (I fear it may have a significant impact, we'd better think about a way of doing it efficiently before breaking everything :) ). I don't know the impact of splitting the main table of the application from a coding point of view. Just another thing I think of : in your examples you always talk about one Jobid. Most of the time, there will be several jobids selected at the same time. I think only the Full Jobids would be used to get the root directories, but I would like to be sure of it... ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
