Hello list!

Sorry to trouble you with what's probably a simple problem, but I'm now
looking at the very real possibility of wiping all our backups clean and
starting from scratch if I can't fix it... :(

I'm having problems with some Full backups, which run for between 1 and
2 hours, appearing to "time out" after the data transfer from the FD to
the SD. The error message (shown below) shows that the data transfer
completes, often in about 1hr30min, and then Bacula does nothing until
the job has been running for 2 hours at which point it gives an FD
error.

Other Full backups (which don't take as long) run correctly, and for
most of the time Inc and Diff backups also run correctly. However, a
small % of backups will fail at random, also with FD errors but at
random times-elapsed during the job... this I have been ascribing to
network fluctuations! The difference is that re-running these random
failures will succeed, whilst this particular Full failure doesn't! ;)

I've already tried setting a heartbeat interval of 20 minutes in the
FD/SD and DIR conf files (thinking that the FD -> Dir connection was
timing out) but this doesn't change anything.

In the time between the data transfer finishing and the timeout,
Postgres has an open connection with a "COPY batch FROM STDIN"
transaction in progress, which at the timeout produces errors in the
Postgres log that I have also shown below.

I'm happy to post portions of the conf files if needed, but they're huge
and might well lead to tl;dr!

Any suggestions as to how I can troubleshoot this further would be most
appreciated!

Nick Lock.


---------------------------------------------------------------------
12-Aug 14:18 exa-bacula-dir JobId 5514: Start Backup JobId 5514,
Job=backup_scavenger.2009-08-12_14.18.06.04
12-Aug 14:18 exa-bacula-dir JobId 5514: There are no more Jobs
associated with Volume "scavenger-full-1250". Marking it purged.
12-Aug 14:18 exa-bacula-dir JobId 5514: All records pruned from Volume
"scavenger-full-1250"; marking it "Purged"
12-Aug 14:18 exa-bacula-dir JobId 5514: Recycled volume
"scavenger-full-1250"
12-Aug 14:18 exa-bacula-dir JobId 5514: Using Device
"FileStorageScavenger"
12-Aug 14:18 exa-bacula-sd JobId 5514: Recycled volume
"scavenger-full-1250" on device
"FileStorageScavenger" (/srv/bacula/volume/web-scavenger), all previous
data lost.
12-Aug 14:18 exa-bacula-dir JobId 5514: Max Volume jobs exceeded.
Marking Volume "scavenger-full-1250" as Used.
12-Aug 15:49 exa-bacula-sd JobId 5514: Job write elapsed time =
01:31:41, Transfer rate = 401.4 K bytes/second
12-Aug 16:18 exa-bacula-dir JobId 5514: Fatal error: Network error with
FD during Backup: ERR=Connection reset by peer
12-Aug 16:18 exa-bacula-dir JobId 5514: Fatal error: No Job status
returned from FD.
12-Aug 16:18 exa-bacula-dir JobId 5514: Error: Bacula exa-bacula-dir
2.4.4 (28Dec08): 12-Aug-2009 16:18:09
  Build OS:               x86_64-pc-linux-gnu debian lenny/sid
  JobId:                  5514
  Job:                    backup_scavenger.2009-08-12_14.18.06.04
  Backup Level:           Full
  Client:                 "scavenger" 2.4.4 (28Dec08)
i486-pc-linux-gnu,debian,5.0
  FileSet:                "full-scavenger" 2009-04-16 15:58:05
  Pool:                   "scavenger-full" (From Job FullPool override)
  Storage:                "FileScavenger" (From Job resource)
  Scheduled time:         12-Aug-2009 14:18:03
  Start time:             12-Aug-2009 14:18:09
  End time:               12-Aug-2009 16:18:09
  Elapsed time:           2 hours 
  Priority:               10
  FD Files Written:       0
  SD Files Written:       81,883
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       2,208,578,175 (2.208 GB)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    no
  Storage Encryption:     no
  Volume name(s):         scavenger-full-1250
  Volume Session Id:      5
  Volume Session Time:    1250080970
  Last Volume Bytes:      2,212,857,316 (2.212 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  OK
  Termination:            *** Backup Error ***

---------------------------------------------------------------------
Postgres Log:

2009-08-12 16:18:09 BST ERROR:  unexpected message type 0x58 during COPY
from stdin
2009-08-12 16:18:09 BST CONTEXT:  COPY batch, line 81884: ""
2009-08-12 16:18:09 BST STATEMENT:  COPY batch FROM STDIN
2009-08-12 16:18:09 BST LOG:  could not send data to client: Broken pipe
2009-08-12 16:18:09 BST LOG:  could not receive data from client:
Connection reset by peer
2009-08-12 16:18:09 BST LOG:  unexpected EOF on client connection



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to