Here is an idea for adding md5sum consistency checks along with a rough ability to reverse-lookup the pc tree entries that are hard linked to any given pool entry. (It probably is not worth coding if ver 4.0 is near since all this functionality reportedly will be built into the new version)
1. Create new trees called say 'md5sum' and 'cmd5sum' parallel to the pool and cpool directory trees 2. Whenever a new file is *added* to the pool/cpool, calculate the full file md5sum and create a new file with the same partial md5sum name (and chain numbering) in the corresponding md5sum/cmd5sum tree with first line containing the full file md5sum 3. Whenever a (new) pc file is linked/copied to the pool append the pc file path (starting from TopDir) to the corresponding file in the md5sum/cmd5sum tree 4. Whenever a pool file is deleted or a chain is renumbered which I believe only happens during BackupPC_nightly (and of course also in my BackupPC_fixLinks script), do the corresponding renumbering on the parallel md5sum/cmd5sum entry Now you can go: A. pc tree entry -> pool entry (this is in general a many to one mapping) Calculate partial file md5sum and find the entry in the corresponding pool/cpool (if there is a chain then choose the chain element with the same inode). This is relatively fast since you only need to read approximately the first MB (and match the inode number if there is a chain) B. pool entry -> pc entries (this is in general a one to many mapping) Lookup corresponding entry in the md5sum/cd5sum tree and look at the lines starting after the first md5sum line C. Check pool (and thus indirectly pc chain validity). Compare md5sums of pool/cpool entries with the first line of the corresponding entrie in the md5sum/cmd5sum tree. This is fast since the number of lines in each entry is o(#backups). Note when backups are deleted, one could theoretically go through the md5su/cmd5sum trees and delete the corresponding entries for each deleted pc tree file. However, this is quite expensive since not only would you need to traverse the entire deleted backup tree, but you would have to calculate the partial file md5sums to figure out where it lies in the md5sum/cmd5sum tree. But there is really very little downside to not deleting the entries from the md5sum/cmd5sum tree since at worse, we have some entries that no longer have corresponding pc tree entries. And even if you delete the last backup and then a new backup with the same number gets created, you know that the last matching entry is the valid one. ------------------------------------------------------------------------------ Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d _______________________________________________ BackupPC-users mailing list [email protected] List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
