Holger Parplies wrote at about 20:10:20 +0200 on Sunday, July 3, 2011: > Hi, > > Kelly Sauke wrote on 2011-07-01 09:21:28 -0500 [[BackupPC-users] > Recompressing individual files in pool]: > > I have a need to modify certain files from backups that I have in > > BackupPC. My pool is compressed and I've found I can decompress single > > files using BackupPC_zcat. I can then modify those files as needed, > > however I cannot figure out how to re-compress those modified files to > > be put back into the pool. Is there a tool available that can do that? > > no. It's not a common requirement to be able to modify files in backups. > Normally, a backup is intended to reflect the state a file system was in at > the time the backup was taken, not the state the file system *should have* > been in or the state *I'd like it* to have been in. I sure hope you have > legitimate reasons for doing this. > > If you are modifying files, you'll need to think about several things. > > * Do you want to modify every occurrence of a specific content (i.e. all > files in all backups linked to one pool file) or only specific files, > while other files continue to contain the unmodified content?
And this may be subtle. You may have other occurrences that you forgot about or are not aware of (say another machine with the same file or an earlier backup you had saved). Destructively editing the pool is not something to do without thinking... > * If you are modifying every occurrence of a specific content, you'll either > have to find out which files link to the pool file (hard, with a reasonably > sized pool) or ensure you're updating the content without changing the > inode > (i.e. open the file for write, not delete and re-create it). If you do > that, > there is not much you can do for failure recovery. Your update had better > succeed. > > * Does your update change the partial file md5sum? If so, you'll need to move > the pool file to its new name and location. Presuming the new content > already exists, you should probably create a hash collision. That may be > less efficient than linking to the target pool file, but it should be legal > (when the maximum link count is exceeded, a second pool file with identical > content is created; later on the link count on the first file may drop due > to expiring backups), and it's certainly simpler than finding all the files > linked to your modified pool file and re-linking them to the pre-existing > pool file. > Yes - unless you are just changing content between the first and last chunks (keeping the file size the same), the partial file md5sum will change. That being said, while it is technically correct and advisable to rename the file with the correct partial file md5sum (including adjusting the suffix for potential collisions), it is not strictly necessary. Indeed, I have had the pleasure of finding several bugs within BackupPC or its libraries that result in wrong md5sum names even under normal conditions. The only real downside of not changing the name is that new versions of the file will not be pooled and will be stored under the correct md5sum name (Note: I am not advising not changing the name, just saying it is not strictly necessary). Another perhaps more important issue is that you really need to change the attrib file. While changing the accesss/mod times may not matter, adjusting the uncompressed filesize (if it changes) is important since some routines may/do use that file size, rather than decompressing the entire file to calculate its size. In any case, even if not critical, having an inconsistency between the actual file size and the size noted in the attrib file is not a good idea and might suggest to anybody or any routine not aware of your monkeying with the file that there has been some serious data corruption. The bottom line is that editing existing files is possible (and indeed I do a lot more 'messy' things in my BackupPC_deleteFile routine) *but* you need to think of all the side-effects and end cases to make sure you won't be messing anything else up. > * If you're only changing individual files in a pc/ directory, the matter is > far more simple. You'll need to take some code from the BackupPC sources > for compressing anyway, so you might as well take the part that handles > pooling as well (see BackupPC::PoolWrite and note that you'll be coding in > Perl ;-). > ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/