Hello, I am running Backuppc 3.1.0 with ubuntu 9.04 now, and since the upgrade I've developed a problem with backuppc. During full backups, the backup stalls and leaves a zombie. The backup just sits (as shown below) until I kill the ssh process or restart backuppc. I have not seen this problem with smb clients, and have not seen it with inc backups (although the incs take a lot less time so maybe it will happen sometime...), just the full with rsync. ------------------- 32167 backuppc 25 5 0 0 0 Z 0 0.0 5:21.56 BackupPC_dump <defunct> 32321 backuppc 25 5 7168 3600 1164 S 0 0.2 0:01.16 ssh 32587 backuppc 25 5 79768 73m 1220 S 0 3.7 2:31.20 BackupPC_dump ----------------------------
The status page says: ----------------- localhost full backuppc 7/23 12:00 BackupPC_dump localhost 32167 32321, 32587 -------------------- The computer doesn't crash or appear to have any problems, and if i watch the whole time memory usage never goes above 10-20% total. When I kill the ssh process, the status page goes back to showing nothing, and a partial backup is left as shown below on the home page for the backup: ------------------ 848 partial yes 0 7/23 00:12 112.8 0.5 /home/backup/pc/localhost/848 --------------- If I keep trying, eventually it makes it through the whole backup, as shown here: --------------- 848 full yes 0 7/23 15:04 152.4 0.1 /home/backup/pc/localhost/848 ----------------- Since the backup eventually completes, and sometimes has to only be restarted once or twice while other times 6 or 7 times, I can't figure out where to start debugging. It is reproducible in that I haven't had a successful full backup without at least one restart since the upgrade. If some of you have some theories, I will be happy to capture additional info about what is going on when this happens...I am not sure offhand what info/logs/files would be important. It does not seem like a hardware problem to me. The log itself ends like this: ---------------- same 764 506/500 49664 home/common/LabView/Gas_Swirl_Process_Control/flame_frf/dyn_data/WriteData2/WriteData2.opt Parent read EOF from child: fatal error! Done: 0 files, 0 bytes Got fatal error during xfer (Child exited prematurely) Backup aborted (Child exited prematurely) Not saving this as a partial backup since it has fewer files than the prior one (got 2 and 0 files versus 5) ------------------- However i suspect that error is from me killing the ssh not whatever caused things to stall. i can't find any errors in the log when it is stalled but before i kill processes as backuppc seems to think things are running fine even though they have stalled out... I am wondering if there is some kind of timeout or permission or other default that has changed with ssh or rsync on the upgrade that is causing this...however since it works for increments and the transport is the same for those I don't see how it could be the case... Also, is there a difference when the process is started from the cgi interface vs. starting itself on schedule? When it succeeds it has always been after one or more restarts from the cgi... anyway thanks, looking forward to hearing ideas from you guys on where to look. Steve ------------------------------------------------------------------------------ _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/