Hi,

I'm having trouble with the backup of one of my clients since last week.


29-Jan 17:09 bacula-dir: Start Backup JobId 217,
Job=SMTCZB0003.2007-01-29_17.09.43

29-Jan 17:09 bacula-dir: Recycled volume "MKS3-2006-02-08"

29-Jan 17:09 bacula-sd: 3301 Issuing autochanger "loaded drive 0" command.

29-Jan 17:09 bacula-sd: 3302 Autochanger "loaded drive 0", result is Slot 12.

29-Jan 17:09 bacula-sd: 3301 Issuing autochanger "loaded drive 0" command.

29-Jan 17:09 bacula-sd: 3302 Autochanger "loaded drive 0", result is Slot 12.

29-Jan 17:10 bacula-sd: Recycled volume "MKS3-2006-02-08" on device
"AIT-2" (/dev/nst0), all previous data lost.

29-Jan 17:10 bacula-sd: Spooling data ...

SMTCZB0003-fd:      Filesystem change prohibited. Will not descend into
/var/lib/nfs/rpc_pipefs

SMTCZB0003-fd:      Filesystem change prohibited. Will not descend into /dev

29-Jan 18:21 bacula-sd: User specified spool size reached.

29-Jan 18:21 bacula-sd: Writing spooled data to Volume. Despooling
37,580,970,446 bytes ...

29-Jan 19:32 bacula-sd: Spooling data again ...

SMTCZB0003-fd:      Filesystem change prohibited. Will not descend into /sys

SMTCZB0003-fd:      Filesystem change prohibited. Will not descend into /boot

29-Jan 20:37 bacula-sd: Committing spooled data to Volume
"MKS3-2006-02-08". Despooling 34,969,384,736 bytes ...

29-Jan 21:48 bacula-sd: Sending spooled attrs to the Director. Despooling
120,303,549 bytes ...


So far, so good. But 3 hours later the backup failed.



30-Jan 00:37 SMTCZB0003-fd: SMTCZB0003.2007-01-29_17.09.43 Fatal error:
job.c:1599 Comm error with SD. bad response to Append Data. ERR=Die
Verbindung wurde vom Kommunikationspartner zurückgesetzt

30-Jan 00:37 bacula-dir: SMTCZB0003.2007-01-29_17.09.43 Error: Bacula
1.38.5 (18Jan06): 30-Jan-2007 00:37:28

  JobId:                  217

  Job:                    SMTCZB0003.2007-01-29_17.09.43

  Backup Level:           Full

  Client:                 "SMTCZB0003-fd"
i686-pc-linux-gnu,redhat,Enterprise release

  FileSet:                "SMTCZB0003" 2006-06-25 08:00:02

  Pool:                   "MKS"

  Storage:                "Autochanger"

  Scheduled time:         29-Jan-2007 17:09:37

  Start time:             29-Jan-2007 17:09:45

  End time:               30-Jan-2007 00:37:28

  Priority:               10

  FD Files Written:       368,570

  SD Files Written:       368,570

  FD Bytes Written:       72,404,687,954

  SD Bytes Written:       72,472,374,146

  Rate:                   2695.3 KB/s

  Software Compression:   None

  Volume name(s):         MKS3-2006-02-08

  Volume Session Id:      1

  Volume Session Time:    1170086393

  Last Volume Bytes:      72,536,797,943

  Non-fatal FD errors:    0

  SD Errors:              0

  FD termination status:  Error

  SD termination status:  OK

  Termination:            *** Backup Error ***

I can see that cpu load of the bacula server was high from beginning of
the backup until 22:45. Then it dropped nearly to zero. The last
successful backup took 50-60 min. for spooling the attributs to the
director. Thus the time from 21:48 to 22:45 would fit that window.

I found nothing interesting in the postgres log and it seems that bacula
finished at 22:46.

2007-01-29 22:38:29 [20740] LOG:  connection authorized: user=postgres
database=bacula
2007-01-29 22:41:34 [20742] LOG:  recycled transaction log file
"0000001500000093"
2007-01-29 22:41:34 [20742] LOG:  recycled transaction log file
"0000001500000094"
2007-01-29 22:43:32 [20745] LOG:  connection received: host=[local] port=
2007-01-29 22:43:32 [20745] LOG:  connection authorized: user=postgres
database=template1
2007-01-29 22:43:32 [20746] LOG:  connection received: host=[local] port=
2007-01-29 22:43:32 [20746] LOG:  connection authorized: user=postgres
database=bacula
2007-01-29 22:46:36 [20747] LOG:  recycled transaction log file
"0000001500000095"
2007-01-29 22:46:36 [20747] LOG:  recycled transaction log file
"0000001500000096"
2007-01-29 22:48:32 [20748] LOG:  connection received: host=[local] port=
2007-01-29 22:48:32 [20748] LOG:  connection authorized: user=postgres
database=template1
2007-01-29 22:48:32 [20749] LOG:  connection received: host=[local] port=
2007-01-29 22:48:32 [20749] LOG:  connection authorized: user=postgres
database=bacula
2007-01-29 22:53:32 [20753] LOG:  connection received: host=[local] port=
2007-01-29 22:53:32 [20753] LOG:  connection authorized: user=postgres
database=template1
2007-01-29 22:53:32 [20754] LOG:  connection received: host=[local] port=
2007-01-29 22:53:32 [20754] LOG:  connection authorized: user=postgres
database=bacula
2007-01-29 22:58:32 [20756] LOG:  connection received: host=[local] port=


I already tried an other tape from the pool, set the heartbeat interval to
60 and restarted the backup several times - same result. The host is
behind a firewall, but I'm not sure if the error message is an evidence of
a fw problem.

30-Jan 00:37 SMTCZB0003-fd: SMTCZB0003.2007-01-29_17.09.43 Fatal error:
job.c:1599 Comm error with SD. bad response to Append Data. ERR=Die
Verbindung wurde vom Kommunikationspartner zurückgesetzt

(translation ;) ERR=The connection was reset by the communication partner)


Backups of other hosts succeeded. Backups of this host were successful for
the last 8 months.

Any idea what to look for? FD/SD Connect Timeout?

Ralf





-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to