I have vastly improved and completely rewritten my program BackupPC_deleteFiles.pl. Also many bugs were fixed ;)
The routine now allows you to delete arbitrary files and directories (or list or globs thereof) across multiple hosts and shares, and arbitrary (contiguous) backup ranges. Specifically, you can now delete files from either a single backup or from a range of backups. The program then appropriately deletes and/or moves files and attributes and correspondingly adds or removes type=10 delete attributes so as to make sure that the files show as fully deleted from the backup range while not affecting the files visible from subsequent backups that were not deleted. The only thing it can't do (and refuses to do) is to delete files that are hard links or directories that contain hard links since I couldn't find any easy way to find and keep track of hard links. The program provides lots of (optional) verbosity and debugging levels so you can be sure you are deleting what you want to (and from a debugging perspective that the appropriate visibility and inheritance rules are being faithfully applied). Since the program is now 1000+ lines long, I won't post it, but I will be happy to email it to anyone interested or post it if there is enough demand. Instead I will just copy over the logic so people can check it if they are so inclined (note it took me multiple attempts before I truly understood the topology of the backup chains and how to efficiently and accurately encode it). I will also include a copy of the usage message and options: -------------------------------------------------------------------- usage: $0 [options] files/directories... NOTE: if -s option not set, then file/directory names include the share name Required options: -h host Host (or - for all) from which path is offset -n backRange Range of successive backup numbers to delete. N delete files from backup N (only) M-N delete files from backups M-N (inclusive) -M delete files from all backups up to M (inclusive) M- delete files from all backups up from M (inlusive) - delete files from ALL backups Optional options: -s shareName Share name (or - for all) from which path is offset (don\'t include the 'f' mangle) -l Just list backups by host (with level noted in parentheses) -r Allow directories to be removed too -H Skip over hard links (otherwise exits without deletions if hard links found) -m Paths are unmangled (i.e. apply mangle to paths) -q Don\'t show deletions -t Trial run -- do everything but deletions -c Clean up pool - schedule BackupPC_nightly to run (requires server running) Only runs if files were deleted -d level Turn on debug level ---------------------------------------------------------------------------- Program logic is as follows: 1. First construct a hash of hashes of 3 arrays and 2 hashes that encapsulates the structure of the full and incremental backups for each host. This hash is called: %backupsHoHA{<hostname>}{<key>} where the keys are: "ante", "post", "baks", "level", "vislvl" with the first 3 keys having arrays as values and the final 2 keys having hashes as values. This pre-step is done since this same structure can be re-used when deleting multiple files and dirs (with potential wilcards) across multiple shares, backups, and hosts. The component arrays and hashes are constructed as folows: - Start by constructing the simple hash %LevelH whose keys map backup numbers to incremental backup levels based on the information in the corresponding backupInfo file. - Then, for each host selected, determine the list (@Baks) of individual backups from which files are to be deleted based on bakRange and the actual existing backups. - Based on this list determine the list of direct antecedent backups (@Ante) that have strictly increasing backup levels starting with the previous level 0 backup. This list thus begins with the previous level zero backup and ends with the last backup before @Baks that has a lower incremental level. Note: this list may be empty if @Baks starts with a full (level 0) backup. Note: there is at most one (and should in general be exactly one) incremental backup per level in this list starting with level 0. - Similarly, constuct the list of direct descendants (@Post) of the elements of @Baks that have strictly decreasing backup levels starting with the first incremental backup after @Baks and continuing until we reach a backup whose level is less than or equal to the level of the lowest incremental backup in @Baks (which may or may not be a level 0 backup). Again this list may be empty if the first backup after @Baks is lower than the level of all backups in @Baks. Also, again, there is at most one backup per level. - Note that by construction, @Ante is stored in ascending order and furthermore each backup number has a strictly ascending incremental level. Similarly, @Post is stored in strictly ascending order but its successive elements have monotonically non-increasing incremental levels. Also, the last element of @Ante has an incremental level lower than the first element of @Baks and the the last element of @Post has an incremental level higher than the lowest level of @Baks. This is all because anything else neither affects nor is affected by deletions in @Baks. In contrast, note that @Baks can have any any pattern of increasing, decreasing, or repeated incremental levels. - Finally, create the second hash (%VislvlH) which has keys equal to levels and values equal to the last backup with that level that could potentially be - visible from @Post (note we will use this to determine which files need to be copied to @Post from @Ante or @Baks after we delete the file entries in @Baks. 2. Second, for each host, combine the share regex and list of files (and/or file shell regexs) with the backup ranges @Ante and @Baks to glob for all files that need either to be deleted from @Baks or blocked from view by setting a type=10 delete attribute type. If a directory is on the list and the remove directory flag (-r) is not set, then signal an error. If any of these files (or dirs) are hard links (either type hard link or a hard link "target") then signal an error (or if the -H flag is set, warn and skip them) since hard links cannot easily be deleted/copied/moved (since the other links will be affected). Duplicate entries and entries that are a subtree of another entry are rationalized and combined. 3. Third, for each host and for each relevant file presence, start going successively through the @Ante, @Baks, and @Post chains to determine which files and attributes need to be deleted, cleared, or copied/linked to @Post. - Start by going through, @Ante, in ascending order to construct two visibility hashes. The first hash, %VisibleAnte, is used to mark whether or not a file may be visible from @Baks from a higher incremental level. The presence of a file set the value of the hash while intervening delete type=10 reset the value to invisible (-1). The second hash, %VisibleAnteBaks, (whose construction continues when we iterate through @Baks) determines whether or not a file from @Ante or @Baks was originally visible from @Post. And if a file was visible, then the backup number of that file is stored in the value of the hash. Note that at each level, there is at *most* one backup from @Ante that is visible from @Baks and similarly there is at *most* one backup from @Ante and @Baks combined that is visible from @Post. - Next, go through @Baks to mark for deletion any instances of the file that are present. Then set the attrib type to type=10 (delete) if %VisibleAnte indicates that a file from @Ante would otherwise be visible at that level. Otherwise, clear the attrib and mark it for deletion. Similarly, once the type=10 type has been set, all higher level element of @Baks can have their file attribs cleared whether they originally indicated a file type or a delete type. - Finally, go through the list of @Post in ascending order. If there is no file and no delete flag present, then use the information coded in %VisibleAnteBaks to determine whether we need to link/copy over a version of the file previously stored in @Ante and/or @Baks (along with the corresponding file attrib entry) or whether we need to set a type=10 delete attribute. Conversely, if originally, there was a type=10 delete attribute, then by construction of @Post, the delete type is no longer needed since the deletion will now occur in one of its antecedents in @Baks, so we need to clear the delete type from the attrib entry. 4. Finally, after all the files for a given host have been marked for deletion, moving/copying or attribute changes, loop through and execute the changes. Files are either unlinked to delete or hard linked (or copied if zeros size) if we need to place a new copy in @Post. Attributes are either cleared (deleted) or set to type=10 delete or copied over to @Post. If all the files for a given attrib file are deleted, then we delete the attrib file too 5. As a last step, optionally BackupPC_nightly is called to clean up the pool, provided you set the -c flag and that the BackupPC daemon is running. Note that this routine itself does NOT touch the pool. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/