Ralf Gross schrieb:
> 03-Jan 10:08 VU0EM005-sd JobId 39: Fatal error: read.c:139 Error
> sending to File daemon. ERR=Die Wartezeit für die Verbindung ist
> abgelaufen
> 03-Jan 10:08 VU0EM005-sd JobId 39: Error: bsock.c:306 Write error
> sending 65536 bytes to client:10.60.1.252:36643: ERR=Die Wartezeit für
> die Verbindung ist abgelaufen
> 
> There is no firewall between the client and server and I have set some
> heartbeat intervalls. This is really strange, because until now it
> only happens with the extra psql db I created for the archive backups.
> The regular psql db and the backup/verify jobs which use the other psql db are
> ok.

I started a verify job without debug option, it seems that it's not stopping at
the same file than before, but in 3 tries it now always stoppped at job file 
90. 

03-Jan 13:10 VU0EM005-dir JobId 42: Start Verify JobId=42 Level=VolumeToCatalog 
Job=VerifyVU0EM003-Archiv.2008-01-03_13.10.03
03-Jan 13:10 VU0EM005-dir JobId 42: Using Device "LTO3"
03-Jan 13:10 VU0EM005-sd JobId 42: Ready to read from volume "06D149L3" on 
device "LTO3" (/dev/ULTRIUM-TD3).
03-Jan 13:10 VU0EM005-sd JobId 42: Forward spacing Volume "06D149L3" to 
file:block 0:1.
03-Jan 13:10 VU0EM005-sd JobId 42: End of file 1 on device "LTO3" 
(/dev/ULTRIUM-TD3), Volume "06D149L3"
03-Jan 13:11 VU0EM005-sd JobId 42: End of file 2 on device "LTO3" 
(/dev/ULTRIUM-TD3), Volume "06D149L3"
[...]
03-Jan 13:41 VU0EM005-sd JobId 42: End of file 88 on device "LTO3" 
(/dev/ULTRIUM-TD3), Volume "06D149L3"
03-Jan 13:42 VU0EM005-sd JobId 42: End of file 89 on device "LTO3" 
(/dev/ULTRIUM-TD3), Volume "06D149L3"
03-Jan 13:42 VU0EM005-sd JobId 42: End of file 90 on device "LTO3" 
(/dev/ULTRIUM-TD3), Volume "06D149L3"
03-Jan 13:57 VU0EM005-sd JobId 42: Fatal error: read.c:139 Error sending to
File daemon. ERR=Die Wartezeit für die Verbindung ist abgelaufen
03-Jan 13:57 VU0EM005-sd JobId 42: Error: bsock.c:306 Write error sending 65536
bytes to client:xx.60.1.252:36643: ERR=Die Wartezeit für die Verbindung ist
abgelaufen


I don't know what's happening there. Is's clear that after some point the
connection gets dropped (SDconnect timeout?). But I don't think that this is the
main problem, because the last job file that was checked is file 90 at 13:42.
At this point the connection between the fd and sd was still there.

bacula-dir and bacula-sd:

Do 3. Jan 13:48:38 CET 2008
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
PID/Program name
tcp        0      0 0.0.0.0:9101            0.0.0.0:*               LISTEN     
24746/bacula-dir
tcp        0      0 0.0.0.0:9102            0.0.0.0:*               LISTEN     
24717/bacula-fd
tcp        0      0 xx.60.9.241:9103        0.0.0.0:*               LISTEN     
24701/bacula-sd
tcp        0      0 xx.60.1.250:9103        0.0.0.0:*               LISTEN     
24701/bacula-sd

tcp        0      0 xx.60.9.241:35958       xx.60.9.241:9103        VERBUNDEN  
24746/bacula-dir
tcp        0      0 127.0.1.1:37299         127.0.1.1:9101          VERBUNDEN  
24797/bconsole
tcp        0      0 xx.60.1.250:56334       xx.60.1.252:9102        VERBUNDEN  
24746/bacula-dir
tcp        0      0 xx.60.9.241:9103        xx.60.9.241:35958       VERBUNDEN  
24701/bacula-sd
tcp        0      0 127.0.1.1:9101          127.0.1.1:37299         VERBUNDEN  
24746/bacula-dir
tcp        0  87260 xx.60.9.241:9103        xx.60.1.252:58822       VERBUNDEN  
24701/bacula-sd

bacula-fd :

Do 3. Jan 13:49:03 CET 2008
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
PID/Program name
tcp        0      0 xx.60.1.252:9102        0.0.0.0:*               LISTEN     
15842/bacula-fd

tcp        0      0 xx.60.1.252:9102        xx.60.1.250:56334       VERBUNDEN  
15842/bacula-fd
tcp        0      0 xx.60.1.252:58822       xx.60.9.241:9103        VERBUNDEN  
15842/bacula-fd


I run verify jobs on a daily base and never had this effect befor. Below is the
job output from the last full backup verify of the same client. But there are
two differences:

1. other db: MyCatalog instead of ArchiveCatalog, ArchiveCatalog is the psql db
that is only used for archive backups

2. other client (see below): here sd and fd are the same machine, this is not
possible with the other job, because this fd doesn't know the ArchiveCatalog
db. So I use th fd that is used for backup.

This is the output of a successful verify job.

02-Dez 18:44 VU0EM005-dir JobId 947: Bacula VU0EM005-dir 2.2.6 (10Nov07): 
02-Dez-2007 18:44:44
  Build OS:               x86_64-unknown-linux-gnu debian 4.0
  JobId:                  947
  Job:                    VerifyVU0EM003.2007-12-02_12.06.07
  FileSet:                VU0EM003
  Verify Level:           VolumeToCatalog
  Client:                 VU0EM005-fd
  Verify JobId:           943
  Verify Job:             VU0EM003
  Start time:             02-Dez-2007 13:48:16
  End time:               02-Dez-2007 18:44:44
  Files Expected:         1,857,442
  Files Examined:         1,857,442
  Non-fatal FD errors:    0
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Verify OK


Ralf

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to