Hi, Jeffrey J. Kosowsky wrote on 2008-10-27 01:18:24 -0400 [Re: [BackupPC-users] Incremental dumps hanging with 'Can't get rsync digests' & 'Can't call method "isCached"']: > I have been playing with your script and it seems to spit out about > 80% of the files/directories (24908/28834) that are backed up in this > incremental backup. > > Most of the 24000 files seem to have 5 or 6 digit perms like 40755 or > 100660. Not sure if this is a problem and if so what to do about it.
sorry about that. I experimented on tar backups, which apparently don't (redundantly) store the file type bits in the "mode" entry. rsync obviously does. That's ok (i.e. expected, once you read the code). It's due to the way file type information is transfered in the respective protocols/data streams. I should check the validity of the file type bits in the mode entry separately. I'll add that soon. For the moment, I suggest you change line 134 to or not exists $permmap {$list [$i + 2] & 07777} (just adding the " & 07777"). > I seem to get this on the good backups too. In fact, at first glance > it's hard to tell what if anything is the difference between a 'good' > and 'bad' backup. For most directories, every file and subdirectory are listed > even though it all seems fine. Yes, with that much bogus output it's very much useless. Sorry again. > I tried doing -X "40755,100644" etc. to exclude these perms but it > didn't seem to have any effect You probably mean "040755,0100644", but I'll change my eval() to oct(), because I doubt anyone uses non-octal numeric modes anyway. eval() was not a good idea to begin with. The patch should fix these two, but the remark applies to any other values you might want to exclude until I change it. > Also, the debug output for the Users list looks good except for the > last 2 elements: > 0755 (which looks like a perm) Yes, that is strange, especially the leading zero. You didn't specify a "-u 0755" option, did you? :) > 65534 (which seems like the max number for a uid) > Similarly the last element of the group list is: 65534. That's nobody and nogroup - don't they appear in your /etc/passwd and /etc/group? > Finally, you may want to add the following debug line to your script for > completeness: > > print "Perms: ", (join ',', sort {$a <=> $b} keys %permmap), "\n" > if $opts {D}; Right. Noted. Though I'll sort $b <=> $a ;-). Jeffrey J. Kosowsky wrote on 2008-10-27 04:00:27 -0400 [Re: [BackupPC-users] Incremental dumps hanging with 'Can't get rsync digests' & 'Can't call method "isCached"']: > Interesting -- I ended up having to reboot (which of course required a > restart of the backuppc service) and the problem went away. > > This is the second time this has happened to me. > I suspect (in a fuzzy type of way) that somehow this may have been > caused by my rebooting the nfs server (which is mounted on > /var/lib/BackupPC) without doing something like restarting the > backuppc service - the result was that for some time there may have > been a stale nfs link hanging around and it is possible that this > occurred during the middle of a backup. Normally, rebooting the NFS server should *not* lead to stale NFS mounts. In my experience that happens when device numbers (on the NFS server) change (though I vaguely remember seeing an unexpected instance of that myself lately). Try to fix it and you will save yourself a lot of headaches (like adding a hook to remount the FS, but that's another thread). This probably means you shouldn't backup to an NFS mounted pool (which you probably shouldn't do for performance reasons anyway). What mount options are you using (esp. hard/soft, intr/nointr, tcp/udp)? > I also may have killed the BackupPC_dump process using 'kill -9' when I was > unable to kill it from the web interface. SIGKILL is a bad habit to get into. You should try SIGINT first (though it probably won't work in the "stale NFS file handle" case). If you can't access the pool, killing BackupPC_dump is unlikely to do any additional harm :). > Still... it would be nice to get some type of email or other warning > when a backup freezes up because conceivably one could be unaware of > this issue for days... The BackupPC daemon could report backups running for an "unusually long" time (for a configurable value of "unusually long") by e-mail. I would strongly argue against aborting them (like $Conf{ClientTimeout} does), because the daemon has even less control over what is actually happening on the network level than BackupPC_dump, but optionally informing the admin seems reasonable. It should be possible to turn these warnings off, though. > I will keep your troubleshooting patch in mind and will use it next > time I see this problem. You will need to apply it before the fact ... but if it's just "stale NFS file handle" [how about including $! in the log message - " (err=$err [$!], ..."?], it's a local configuration problem rather than something BackupPC could sensibly handle. Aborting on (detectable!) fatal errors is one thing, but providing logic to call a hook and retry the failing operation on every disk access is clearly not a good idea. Neither is aborting an otherwise good backup because one attrib file happens to have gone missing. To sum it up, your problem appears to be NFS server related ("stale NFS file handle"), not due to corrupted attrib files (though a crashing NFS server could lead to corruption of an attrib file, I guess). Thank you for the feedback on my script anyway. Regards, Holger ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/