Hi, I've a problem with some verify jobs. My normal backup/verify jobs are running fine. For my archive backups I created a extra psql db - I don't know if this makes a difference.
# Catalog Catalog { Name = MyCatalog dbname = bacula; user = bacula; password = verysecret } # Archive Catalog Catalog { Name = ArchiveCatalog dbname = bacula_archive; user = bacula; password = verysecret At the end of the year I did some archive backups which were ok, but the verify jobs finished with errors. Some of my bacula-fd were still 2.0.3 while bacula-dir was 2.2.6. I've now updated the clients to 2.2.7. This is what I get during a verify job and with debug level 100: bacula-sd: VU0EM005-sd: jcr.c:603-0 OnEntry JobStatus=R set=R VU0EM005-sd: jcr.c:623-0 OnExit JobStatus=R set=R VU0EM005-sd: acquire.c:292-0 Dec reserve=0 dev="LTO3" (/dev/ULTRIUM-TD3) VU0EM005-sd: dev.c:1583-0 reposition from 0:0 to 0:1 VU0EM005-sd: dev.c:1608-0 fsr 1 VU0EM005-sd: dev.c:1459-0 fsr 1 VU0EM005-sd: bnet.c:666-0 who=client host=xx.61.198.248 port=36643 VU0EM005-sd: jcr.c:603-0 OnEntry JobStatus=VU0EM005-sd: jcr.c:623-0 OnExit JobStatus=C set=C VU0EM005-sd: cram-md5.c:73-0 send: auth cram-md5 <[EMAIL PROTECTED]> ssl=0 VU0EM005-sd: cram-md5.c:133-0 cram-get received: auth cram-md5 <[EMAIL PROTECTED]> ssl=0 VU0EM005-sd: cram-md5.c:152-0 sending resp to challenge: xxxxxxxx VU0EM005-sd: dircmd.c:207-0 Message channel init completed. VU0EM005-sd: pythonlib.c:237-0 No startup module. VU0EM005-sd: bnet.c:666-0 who=client host=xx.61.198.248 port=36643 VU0EM005-sd: jcr.c:603-0 OnEntry JobStatus=VU0EM005-sd: jcr.c:623-0 OnExit JobStatus=C set=C VU0EM005-sd: cram-md5.c:73-0 send: auth cram-md5 <[EMAIL PROTECTED]> ssl=0 VU0EM005-sd: cram-md5.c:133-0 cram-get received: auth cram-md5 <[EMAIL PROTECTED]> ssl=0 VU0EM005-sd: cram-md5.c:152-0 sending resp to challenge: xxxxxxx VU0EM005-sd: dircmd.c:207-0 Message channel init completed. VU0EM005-sd: pythonlib.c:237-0 No startup module. VU0EM005-sd: read.c:137 Error sending to FD. ERR=Die Wartezeit für die Verbindung ist abgelaufen VU0EM005-sd: jcr.c:603-0 OnEntry JobStatus=R set=f VU0EM005-sd: jcr.c:623-0 OnExit JobStatus=f set=f bacula-fd on client: VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163457 Stream=3. VU0EM003: verify_vol.c:115-0 Got stream data, len=16 VU0EM003: verify_vol.c:219-0 bfiled>bdird: MD5 len=44: msg=163457 3 2SB4mVdI2nVsbY863FMwUg *MD5-163457* VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=1. VU0EM003: verify_vol.c:115-0 Got stream data, len=241 VU0EM003: verify_vol.c:149-0 Got Attr: FilInx=163458 type=3 VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2. VU0EM003: verify_vol.c:115-0 Got stream data, len=65536 VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2. VU0EM003: verify_vol.c:115-0 Got stream data, len=65536 VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2. VU0EM003: verify_vol.c:115-0 Got stream data, len=65536 VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2. VU0EM003: verify_vol.c:115-0 Got stream data, len=65536 VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2. VU0EM003: verify_vol.c:115-0 Got stream data, len=65536 VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2. VU0EM003: verify_vol.c:115-0 Got stream data, len=65536 VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2. VU0EM003: verify_vol.c:115-0 Got stream data, len=65536 VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2. VU0EM003: verify_vol.c:115-0 Got stream data, len=65536 VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2. VU0EM003: verify_vol.c:115-0 Got stream data, len=65536 VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2. bacula-dir: VU0EM005-dir: sql_get.c:127-0 Get_file_record JobId=27 FilenameId=477058 PathId=114554 VU0EM005-dir: sql_get.c:129-0 Query=SELECT FileId, LStat, MD5 FROM File WHERE File.JobId=27 AND File.PathId=114554 AND File.FilenameId=477058 VU0EM005-dir: sql_get.c:133-0 get_file_record num_rows=1 VU0EM005-dir: getmsg.c:110-0 bget_dirmsg 44: 163457 32SB4mVdI2nVsbY863FMwUg *MD5-163457* VU0EM005-dir: getmsg.c:110-0 bget_dirmsg 248: 163458 1 pinsug5 /pathtofile VU0EM005-dir: verify.c:570-0 dird<filed: stream=1 /pathtofile VU0EM005-dir: verify.c:571-0 dird<filed: attr=P0C DACxYP IGw B MZq BOI A MrTc BAA Blg BDorWs BCpG16 BGn0BK A A C VU0EM005-dir: sql_get.c:73-0 db_get_file_att_record fname=/pathtofile VU0EM005-dir: sql_get.c:127-0 Get_file_record JobId=27 FilenameId=477059 PathId=114554 VU0EM005-dir: sql_get.c:129-0 Query=SELECT FileId, LStat, MD5 FROM File WHERE File.JobId=27 AND File.PathId=114554 AND File.FilenameId=477059 ^^^^^ I think the problem starts with this sql query VU0EM005-dir: sql_get.c:133-0 get_file_record num_rows=1 VU0EM005-dir: getmsg.c:110-0 bget_dirmsg -1: VU0EM005-dir: getmsg.c:110-0 bget_dirmsg -1: VU0EM005-dir: getmsg.c:110-0 bget_dirmsg -1: VU0EM005-dir: getmsg.c:110-0 bget_dirmsg -1: [snip] VU0EM005-dir: getmsg.c:110-0 bget_dirmsg -1: VU0EM005-dir: bnet.c:666-0 who=client host=xx.61.198.248 port=36131 VU0EM005-dir: jcr.c:603-0 OnEntry JobStatus=VU0EM005-dir: jcr.c:623-0 OnExit JobStatus=C set=C VU0EM005-dir: job.c:1126-0 wstorage=Neo4100 VU0EM005-dir: job.c:1135-0 wstore=Neo4100 where=Job resource VU0EM005-dir: jcr.c:603-0 OnEntry JobStatus=C set=R VU0EM005-dir: jcr.c:623-0 OnExit JobStatus=R set=R VU0EM005-dir: cram-md5.c:73-0 send: auth cram-md5 <[EMAIL PROTECTED]> ssl=0 VU0EM005-dir: cram-md5.c:133-0 cram-get received: auth cram-md5 <[EMAIL PROTECTED]> ssl=0 VU0EM005-dir: cram-md5.c:152-0 sending resp to challenge: xxxxxx VU0EM005-dir: ua_dotcmds.c:128-0 Cmd: .status dir current VU0EM005-dir: ua_status.c:64-0 status:.status dir current VU0EM005-dir: getmsg.c:110-0 bget_dirmsg -1: VU0EM005-dir: getmsg.c:110-0 bget_dirmsg -1: VU0EM005-dir: getmsg.c:110-0 bget_dirmsg 204: Jmsg Job=VerifyVU0EM003-Archiv.2008-01-03_09.09.07 type=3 level=1199351336 VU0EM005-sd JobId 39: Fatal error: read.c:139 Error sending to File daemon. ERR=Die Wartezeit für die Verbindung ist abgelaufen It seems that the last sql query is making problems. But if I issue this query in bconsole I get an result. Enter SQL query: SELECT FileId, LStat, MD5 FROM File WHERE File.JobId=27 AND File.PathId=114554 AND File.FilenameId=477059; +---------+--------------------------------------------------------------------+------------------------+ | fileid | lstat | md5 | +---------+--------------------------------------------------------------------+------------------------+ | 627,213 | P0C DACxYP IGw B MZq BOI A MrTc BAA Blg BDorWs BCpG16 BGn0BK A A C | Q2YsnU6biuCxjOzoGP2qrw | +---------+--------------------------------------------------------------------+------------------------+ I've run 2 verify jobs with debug level 100 and it always fails at this point. The verify job than fails with this error message: 03-Jan 10:08 VU0EM005-sd JobId 39: Fatal error: read.c:139 Error sending to File daemon. ERR=Die Wartezeit für die Verbindung ist abgelaufen 03-Jan 10:08 VU0EM005-sd JobId 39: Error: bsock.c:306 Write error sending 65536 bytes to client:10.60.1.252:36643: ERR=Die Wartezeit für die Verbindung ist abgelaufen There is no firewall between the client and server and I have set some heartbeat intervalls. This is really strange, because until now it only happens with the extra psql db I created for the archive backups. The regular psql db and the backup/verify jobs which use the other psql db are ok. Ralf ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users