Ralf Gross schrieb: > 03-Jan 10:08 VU0EM005-sd JobId 39: Fatal error: read.c:139 Error > sending to File daemon. ERR=Die Wartezeit für die Verbindung ist > abgelaufen > 03-Jan 10:08 VU0EM005-sd JobId 39: Error: bsock.c:306 Write error > sending 65536 bytes to client:10.60.1.252:36643: ERR=Die Wartezeit für > die Verbindung ist abgelaufen > > There is no firewall between the client and server and I have set some > heartbeat intervalls. This is really strange, because until now it > only happens with the extra psql db I created for the archive backups. > The regular psql db and the backup/verify jobs which use the other psql db are > ok.
I started a verify job without debug option, it seems that it's not stopping at the same file than before, but in 3 tries it now always stoppped at job file 90. 03-Jan 13:10 VU0EM005-dir JobId 42: Start Verify JobId=42 Level=VolumeToCatalog Job=VerifyVU0EM003-Archiv.2008-01-03_13.10.03 03-Jan 13:10 VU0EM005-dir JobId 42: Using Device "LTO3" 03-Jan 13:10 VU0EM005-sd JobId 42: Ready to read from volume "06D149L3" on device "LTO3" (/dev/ULTRIUM-TD3). 03-Jan 13:10 VU0EM005-sd JobId 42: Forward spacing Volume "06D149L3" to file:block 0:1. 03-Jan 13:10 VU0EM005-sd JobId 42: End of file 1 on device "LTO3" (/dev/ULTRIUM-TD3), Volume "06D149L3" 03-Jan 13:11 VU0EM005-sd JobId 42: End of file 2 on device "LTO3" (/dev/ULTRIUM-TD3), Volume "06D149L3" [...] 03-Jan 13:41 VU0EM005-sd JobId 42: End of file 88 on device "LTO3" (/dev/ULTRIUM-TD3), Volume "06D149L3" 03-Jan 13:42 VU0EM005-sd JobId 42: End of file 89 on device "LTO3" (/dev/ULTRIUM-TD3), Volume "06D149L3" 03-Jan 13:42 VU0EM005-sd JobId 42: End of file 90 on device "LTO3" (/dev/ULTRIUM-TD3), Volume "06D149L3" 03-Jan 13:57 VU0EM005-sd JobId 42: Fatal error: read.c:139 Error sending to File daemon. ERR=Die Wartezeit für die Verbindung ist abgelaufen 03-Jan 13:57 VU0EM005-sd JobId 42: Error: bsock.c:306 Write error sending 65536 bytes to client:xx.60.1.252:36643: ERR=Die Wartezeit für die Verbindung ist abgelaufen I don't know what's happening there. Is's clear that after some point the connection gets dropped (SDconnect timeout?). But I don't think that this is the main problem, because the last job file that was checked is file 90 at 13:42. At this point the connection between the fd and sd was still there. bacula-dir and bacula-sd: Do 3. Jan 13:48:38 CET 2008 Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:9101 0.0.0.0:* LISTEN 24746/bacula-dir tcp 0 0 0.0.0.0:9102 0.0.0.0:* LISTEN 24717/bacula-fd tcp 0 0 xx.60.9.241:9103 0.0.0.0:* LISTEN 24701/bacula-sd tcp 0 0 xx.60.1.250:9103 0.0.0.0:* LISTEN 24701/bacula-sd tcp 0 0 xx.60.9.241:35958 xx.60.9.241:9103 VERBUNDEN 24746/bacula-dir tcp 0 0 127.0.1.1:37299 127.0.1.1:9101 VERBUNDEN 24797/bconsole tcp 0 0 xx.60.1.250:56334 xx.60.1.252:9102 VERBUNDEN 24746/bacula-dir tcp 0 0 xx.60.9.241:9103 xx.60.9.241:35958 VERBUNDEN 24701/bacula-sd tcp 0 0 127.0.1.1:9101 127.0.1.1:37299 VERBUNDEN 24746/bacula-dir tcp 0 87260 xx.60.9.241:9103 xx.60.1.252:58822 VERBUNDEN 24701/bacula-sd bacula-fd : Do 3. Jan 13:49:03 CET 2008 Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 xx.60.1.252:9102 0.0.0.0:* LISTEN 15842/bacula-fd tcp 0 0 xx.60.1.252:9102 xx.60.1.250:56334 VERBUNDEN 15842/bacula-fd tcp 0 0 xx.60.1.252:58822 xx.60.9.241:9103 VERBUNDEN 15842/bacula-fd I run verify jobs on a daily base and never had this effect befor. Below is the job output from the last full backup verify of the same client. But there are two differences: 1. other db: MyCatalog instead of ArchiveCatalog, ArchiveCatalog is the psql db that is only used for archive backups 2. other client (see below): here sd and fd are the same machine, this is not possible with the other job, because this fd doesn't know the ArchiveCatalog db. So I use th fd that is used for backup. This is the output of a successful verify job. 02-Dez 18:44 VU0EM005-dir JobId 947: Bacula VU0EM005-dir 2.2.6 (10Nov07): 02-Dez-2007 18:44:44 Build OS: x86_64-unknown-linux-gnu debian 4.0 JobId: 947 Job: VerifyVU0EM003.2007-12-02_12.06.07 FileSet: VU0EM003 Verify Level: VolumeToCatalog Client: VU0EM005-fd Verify JobId: 943 Verify Job: VU0EM003 Start time: 02-Dez-2007 13:48:16 End time: 02-Dez-2007 18:44:44 Files Expected: 1,857,442 Files Examined: 1,857,442 Non-fatal FD errors: 0 FD termination status: OK SD termination status: OK Termination: Verify OK Ralf ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users