Alexander Kobel wrote at about 14:38:22 +0200 on Wednesday, August 21, 2024:
 > On 2024-08-21 12:40, Guillermo Rozas wrote:
 > > If you change it back to --ignore-times it will re-test the server files 
 > > by comparing block-by-block checksums with the one on the client. If a 
 > > file is corrupted in the server this will detect the difference and update 
 > > it as a "new version" of the file from then on. However, --ignore-times is 
 > > MUCH slower than --checksum, so you may want to run it only in response to 
 > > a corruption suspicion and not regularly.
 > 
 > 
 > That's a quite reasonable explanation, yes. Nevertheless, I wonder if the 
 > proper reaction for BackupPC_fsck should be to move an obviously invalid 
 > file out of the way (i.e. to a "quarantine" location or so), to enforce that 
 > check - after all, the corruption has already been detected and confirmed.
 > 

I don't think moving it to a "quarantine" is a good idea since the
corruption could only be minor (even a single character change and the
md5sum's won't match). And if moved, then historical backups won't be
able to find the file and there is no good (i.e., quick) way to find
all the attrib files that point to the damaged file beyond manually
opening and searching through potentially hundreds of thousands if not
more attrib files in the pc tree.

A potentially better solution would be to keep a list of bad files and
on subsequent backups create a new file with suffix _1 (or higher as
appropriate) leveraging the hash collision mechanism within BackupPC.
But that would add overhead to every new file being backed up to
check against the list.

Even better might be to do the following. Rename corrupted files in
place with an additional terminal '_corrupt' suffix.
Change the logic in BackupPC as follows:
- If fsck finds a file to be corrupt or any pool access to the file
  fails to complete properly, rename the corresponding pool file in
  place  with a '_corrupt' suffix.
  Optionally, in addition to logging such corruption, it would be good
  to maintain a file containing a list of all such corruptions.

- If same file needs to be backed up again, treat the '_corrupt' as a
  hash collision and backup with an incremented '_N' suffix as per any
  hash collision (this will effectively create a new copy)

- If original, corrupt file needs to be read
    * Change lookup algorithm so that if the file can't be found in
          the pool at the original attrib lookup location, say
          '<original-hash>', then look at '<original-hash>_corrupt' instead
    * Log a warning that a corrupted file has been referenced

The above would require only trivial changes to the code and would not
impose any performance penalty except when referencing a corrupted file.

I also think it would be helpful to show the number of corrupt files
in one of the GUI status pages together with a link to a browseable
list of the corrupted files to date


_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/

Reply via email to