Hi, I'm having trouble with the backup of one of my clients since last week.
29-Jan 17:09 bacula-dir: Start Backup JobId 217, Job=SMTCZB0003.2007-01-29_17.09.43 29-Jan 17:09 bacula-dir: Recycled volume "MKS3-2006-02-08" 29-Jan 17:09 bacula-sd: 3301 Issuing autochanger "loaded drive 0" command. 29-Jan 17:09 bacula-sd: 3302 Autochanger "loaded drive 0", result is Slot 12. 29-Jan 17:09 bacula-sd: 3301 Issuing autochanger "loaded drive 0" command. 29-Jan 17:09 bacula-sd: 3302 Autochanger "loaded drive 0", result is Slot 12. 29-Jan 17:10 bacula-sd: Recycled volume "MKS3-2006-02-08" on device "AIT-2" (/dev/nst0), all previous data lost. 29-Jan 17:10 bacula-sd: Spooling data ... SMTCZB0003-fd: Filesystem change prohibited. Will not descend into /var/lib/nfs/rpc_pipefs SMTCZB0003-fd: Filesystem change prohibited. Will not descend into /dev 29-Jan 18:21 bacula-sd: User specified spool size reached. 29-Jan 18:21 bacula-sd: Writing spooled data to Volume. Despooling 37,580,970,446 bytes ... 29-Jan 19:32 bacula-sd: Spooling data again ... SMTCZB0003-fd: Filesystem change prohibited. Will not descend into /sys SMTCZB0003-fd: Filesystem change prohibited. Will not descend into /boot 29-Jan 20:37 bacula-sd: Committing spooled data to Volume "MKS3-2006-02-08". Despooling 34,969,384,736 bytes ... 29-Jan 21:48 bacula-sd: Sending spooled attrs to the Director. Despooling 120,303,549 bytes ... So far, so good. But 3 hours later the backup failed. 30-Jan 00:37 SMTCZB0003-fd: SMTCZB0003.2007-01-29_17.09.43 Fatal error: job.c:1599 Comm error with SD. bad response to Append Data. ERR=Die Verbindung wurde vom Kommunikationspartner zurückgesetzt 30-Jan 00:37 bacula-dir: SMTCZB0003.2007-01-29_17.09.43 Error: Bacula 1.38.5 (18Jan06): 30-Jan-2007 00:37:28 JobId: 217 Job: SMTCZB0003.2007-01-29_17.09.43 Backup Level: Full Client: "SMTCZB0003-fd" i686-pc-linux-gnu,redhat,Enterprise release FileSet: "SMTCZB0003" 2006-06-25 08:00:02 Pool: "MKS" Storage: "Autochanger" Scheduled time: 29-Jan-2007 17:09:37 Start time: 29-Jan-2007 17:09:45 End time: 30-Jan-2007 00:37:28 Priority: 10 FD Files Written: 368,570 SD Files Written: 368,570 FD Bytes Written: 72,404,687,954 SD Bytes Written: 72,472,374,146 Rate: 2695.3 KB/s Software Compression: None Volume name(s): MKS3-2006-02-08 Volume Session Id: 1 Volume Session Time: 1170086393 Last Volume Bytes: 72,536,797,943 Non-fatal FD errors: 0 SD Errors: 0 FD termination status: Error SD termination status: OK Termination: *** Backup Error *** I can see that cpu load of the bacula server was high from beginning of the backup until 22:45. Then it dropped nearly to zero. The last successful backup took 50-60 min. for spooling the attributs to the director. Thus the time from 21:48 to 22:45 would fit that window. I found nothing interesting in the postgres log and it seems that bacula finished at 22:46. 2007-01-29 22:38:29 [20740] LOG: connection authorized: user=postgres database=bacula 2007-01-29 22:41:34 [20742] LOG: recycled transaction log file "0000001500000093" 2007-01-29 22:41:34 [20742] LOG: recycled transaction log file "0000001500000094" 2007-01-29 22:43:32 [20745] LOG: connection received: host=[local] port= 2007-01-29 22:43:32 [20745] LOG: connection authorized: user=postgres database=template1 2007-01-29 22:43:32 [20746] LOG: connection received: host=[local] port= 2007-01-29 22:43:32 [20746] LOG: connection authorized: user=postgres database=bacula 2007-01-29 22:46:36 [20747] LOG: recycled transaction log file "0000001500000095" 2007-01-29 22:46:36 [20747] LOG: recycled transaction log file "0000001500000096" 2007-01-29 22:48:32 [20748] LOG: connection received: host=[local] port= 2007-01-29 22:48:32 [20748] LOG: connection authorized: user=postgres database=template1 2007-01-29 22:48:32 [20749] LOG: connection received: host=[local] port= 2007-01-29 22:48:32 [20749] LOG: connection authorized: user=postgres database=bacula 2007-01-29 22:53:32 [20753] LOG: connection received: host=[local] port= 2007-01-29 22:53:32 [20753] LOG: connection authorized: user=postgres database=template1 2007-01-29 22:53:32 [20754] LOG: connection received: host=[local] port= 2007-01-29 22:53:32 [20754] LOG: connection authorized: user=postgres database=bacula 2007-01-29 22:58:32 [20756] LOG: connection received: host=[local] port= I already tried an other tape from the pool, set the heartbeat interval to 60 and restarted the backup several times - same result. The host is behind a firewall, but I'm not sure if the error message is an evidence of a fw problem. 30-Jan 00:37 SMTCZB0003-fd: SMTCZB0003.2007-01-29_17.09.43 Fatal error: job.c:1599 Comm error with SD. bad response to Append Data. ERR=Die Verbindung wurde vom Kommunikationspartner zurückgesetzt (translation ;) ERR=The connection was reset by the communication partner) Backups of other hosts succeeded. Backups of this host were successful for the last 8 months. Any idea what to look for? FD/SD Connect Timeout? Ralf ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users