Hello, Here are a couple of notes about this feature, then some related ideas ...
- There is a clear need for a feature like this. If you have a Job that has the File records pruned, and it was a backup of 1TB but you only want a tiny portion of that, the only alternative to a solution like this is to scan the Volume, which is terribly slow. - As Martin points out, this code gives the SD a bit more knowledge of the records it has stored, but unless someone has a better idea, I see no alternative. - One aspect of this code I haven't looked at yet is whether it is really required to add it in read_record.c rather than match_bsr.c, where all the other bsr filtering code is located. To be investigated ... ======== On a similar but slightly different subject: one user brought up a problem that we are surely likely to see quite a lot in the near future. He has 600 million File records in his Bacula catalog, and he is required to have at least a 7 year retention period, which means the database is growing (I think it is currently at 100GB), and it will continue to grow. He has proposed to improve performance to have a separate File table for each client. This would very likely improve the performance quite a lot because if you have say 60 clients, instead of having one gigantic File table it would be split into 60 smaller tables. For example, instead of referencing File, Bacula would for a clients named FD1 and FD2 reference FD1Files and FD2Files, and so on, each of which would be identical tables but containing only the data for a single client. The problem I have with the suggestion is that it would require rather massive changes to the current SQL code, and it would break all external programs that reference the File table of the database. The first important information is that version 3.0.0 we are planning to switch to by default using a 64 bit Id for the File table -- this will remove the current restriction of 4G files (it can manually be enabled in the current version, so the main change is to make it automatic). The second thing that could help a lot is the "Selective restore" patch submitted by Kjetil, because although a user may have a requirement for long retention periods, that does not necessarily mean the all the File records must be kept -- what is probably the most important is retaining the data and being able to extract it in a reasonable amount of time. Implementation of this patch will allow some users to prune the File records even though the Volumes must be kept a long time. Obviously this will not satisfy all requirements. Another suggestion that I have for the problem of growing File tables is a sort of compromise. Suppose that we implement two File retention periods. One as currently exists that defines when the records are deleted, and a new period that defines when the records are moved out of the File table and placed in a secondary table perhaps called OldFiles. This would allow users to keep the efficiency for active files high but at the same time allow the delete retention period to be quite long. The database would still grow, but there would be a lot less overhead. Actually the name of the table for these "expired" File records could even be defined on a client by client or Job by Job basis which would allow for having multiple "OldFiles" tables. Another advantage of my suggestion would be that within Bacula itself, switching from using the File table to using the OldFiles table could be made totally automatic (it will require a bit of code, but no massive changes). External programs would still function normally in most cases, but if they wanted to access older data, they would need some modification. We could also envision moving the "expired" File records to a different database, which would in the end be much more efficient, but would require considerably more work to implement. Whatever is finally decided, it is clear to me that it is unlikely to be implemented in time for the next major release (planned for the end of the year). I would appreciate your comments on either the "Selective restore" feature and/or the "multiple File table" feature. Best regards, Kern On Friday 15 August 2008 14:00:12 Kjetil Torgrim Homme wrote: > I needed to restore a subset of some old backups. Restoring the full > backups would need a terabyte of temporary storage, which seemed a bit > wasteful (and inconvenient to get hold of) since the data I was > interested in took less than a gigabyte. > > Anyway -- I implemented a simple regex to filter the files to restore. > It works like this: > > Building directory tree for JobId(s) 28644 ... > There were no files inserted into the tree, so file selection > is not possible.Most likely your retention policy pruned the files > > Do you want to restore all the files? (yes|no): no > > Regexp matching files to restore? (empty to abort): ^/var/log > > The patch adds a new keyword to the bootstrap file, FilePattern, which > the storage daemon will apply to all files before deciding whether to > send the file over to the fd. The fd doesn't need any changes, btw. > > This is just a quick hack, and there is some polishing left to do: > > * Only available interactively in the specific case above, but > could be useful as an alternative/supplement to marking files and > directories manually. > > * Can not be modified like the other job parameters. > > * Bacula will complain that the number of restored files is > different from what it expected in the final report. > > * Documentation is not updated. > > The patch is against revision 7469, which we are now running in > production (don't tell my boss ;-). I hope others will find it > useful. > > PS. Kern, the GPL paperwork is on its way to Switzerland. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel