Hi,

I've a problem with some verify jobs. My normal backup/verify jobs are
running fine. For my archive backups I created a extra psql db - I
don't know if this makes a difference. 

# Catalog
Catalog {
  Name = MyCatalog
  dbname = bacula; user = bacula; password = verysecret
}

# Archive Catalog
Catalog {
  Name = ArchiveCatalog
  dbname = bacula_archive; user = bacula; password = verysecret



At the end of the year I did some archive backups which were ok, but
the verify jobs finished with errors.  Some of my bacula-fd were
still 2.0.3 while bacula-dir was 2.2.6. I've now updated the clients to
2.2.7.

This is what I get during a verify job and with debug level 100:

bacula-sd:

VU0EM005-sd: jcr.c:603-0 OnEntry JobStatus=R set=R
VU0EM005-sd: jcr.c:623-0 OnExit JobStatus=R set=R
VU0EM005-sd: acquire.c:292-0 Dec reserve=0 dev="LTO3"
(/dev/ULTRIUM-TD3)
VU0EM005-sd: dev.c:1583-0 reposition from 0:0 to 0:1
VU0EM005-sd: dev.c:1608-0 fsr 1
VU0EM005-sd: dev.c:1459-0 fsr 1
VU0EM005-sd: bnet.c:666-0 who=client host=xx.61.198.248 port=36643
VU0EM005-sd: jcr.c:603-0 OnEntry JobStatus=VU0EM005-sd: jcr.c:623-0
OnExit JobStatus=C set=C
VU0EM005-sd: cram-md5.c:73-0 send: auth cram-md5
<[EMAIL PROTECTED]> ssl=0
VU0EM005-sd: cram-md5.c:133-0 cram-get received: auth cram-md5
<[EMAIL PROTECTED]> ssl=0
VU0EM005-sd: cram-md5.c:152-0 sending resp to challenge: xxxxxxxx
VU0EM005-sd: dircmd.c:207-0 Message channel init completed.
VU0EM005-sd: pythonlib.c:237-0 No startup module.
VU0EM005-sd: bnet.c:666-0 who=client host=xx.61.198.248 port=36643
VU0EM005-sd: jcr.c:603-0 OnEntry JobStatus=VU0EM005-sd: jcr.c:623-0
OnExit JobStatus=C set=C
VU0EM005-sd: cram-md5.c:73-0 send: auth cram-md5
<[EMAIL PROTECTED]> ssl=0
VU0EM005-sd: cram-md5.c:133-0 cram-get received: auth cram-md5
<[EMAIL PROTECTED]> ssl=0
VU0EM005-sd: cram-md5.c:152-0 sending resp to challenge: xxxxxxx
VU0EM005-sd: dircmd.c:207-0 Message channel init completed.
VU0EM005-sd: pythonlib.c:237-0 No startup module.
VU0EM005-sd: read.c:137 Error sending to FD. ERR=Die Wartezeit für die
Verbindung ist abgelaufen
VU0EM005-sd: jcr.c:603-0 OnEntry JobStatus=R set=f
VU0EM005-sd: jcr.c:623-0 OnExit JobStatus=f set=f



bacula-fd on client:

VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163457 Stream=3.
VU0EM003: verify_vol.c:115-0 Got stream data, len=16
VU0EM003: verify_vol.c:219-0 bfiled>bdird: MD5 len=44: msg=163457 3
2SB4mVdI2nVsbY863FMwUg *MD5-163457*
VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=1.
VU0EM003: verify_vol.c:115-0 Got stream data, len=241
VU0EM003: verify_vol.c:149-0 Got Attr: FilInx=163458 type=3
VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2.
VU0EM003: verify_vol.c:115-0 Got stream data, len=65536
VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2.
VU0EM003: verify_vol.c:115-0 Got stream data, len=65536
VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2.
VU0EM003: verify_vol.c:115-0 Got stream data, len=65536
VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2.
VU0EM003: verify_vol.c:115-0 Got stream data, len=65536
VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2.
VU0EM003: verify_vol.c:115-0 Got stream data, len=65536
VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2.
VU0EM003: verify_vol.c:115-0 Got stream data, len=65536
VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2.
VU0EM003: verify_vol.c:115-0 Got stream data, len=65536
VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2.
VU0EM003: verify_vol.c:115-0 Got stream data, len=65536
VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2.
VU0EM003: verify_vol.c:115-0 Got stream data, len=65536
VU0EM003: verify_vol.c:102-0 Got hdr: FilInx=163458 Stream=2.

bacula-dir:

VU0EM005-dir: sql_get.c:127-0 Get_file_record JobId=27
FilenameId=477058 PathId=114554
VU0EM005-dir: sql_get.c:129-0 Query=SELECT FileId, LStat, MD5 FROM
File WHERE File.JobId=27 AND File.PathId=114554 AND
File.FilenameId=477058
VU0EM005-dir: sql_get.c:133-0 get_file_record num_rows=1
VU0EM005-dir: getmsg.c:110-0 bget_dirmsg 44: 163457 32SB4mVdI2nVsbY863FMwUg 
*MD5-163457*
VU0EM005-dir: getmsg.c:110-0 bget_dirmsg 248: 163458 1 pinsug5 /pathtofile
VU0EM005-dir: verify.c:570-0 dird<filed: stream=1 /pathtofile
VU0EM005-dir: verify.c:571-0 dird<filed: attr=P0C DACxYP IGw B MZq BOI A MrTc 
BAA Blg BDorWs BCpG16 BGn0BK A A C
VU0EM005-dir: sql_get.c:73-0 db_get_file_att_record fname=/pathtofile
VU0EM005-dir: sql_get.c:127-0 Get_file_record JobId=27 FilenameId=477059 
PathId=114554
VU0EM005-dir: sql_get.c:129-0 Query=SELECT FileId, LStat, MD5 FROM
File WHERE File.JobId=27 AND File.PathId=114554 AND
File.FilenameId=477059

^^^^^ I think the problem starts with this sql query

VU0EM005-dir: sql_get.c:133-0 get_file_record num_rows=1
VU0EM005-dir: getmsg.c:110-0 bget_dirmsg -1: VU0EM005-dir:
getmsg.c:110-0 bget_dirmsg -1: VU0EM005-dir: getmsg.c:110-0
bget_dirmsg -1: VU0EM005-dir: getmsg.c:110-0 bget_dirmsg -1:
[snip]
VU0EM005-dir: getmsg.c:110-0 bget_dirmsg -1: VU0EM005-dir:
bnet.c:666-0 who=client host=xx.61.198.248 port=36131
VU0EM005-dir: jcr.c:603-0 OnEntry JobStatus=VU0EM005-dir: jcr.c:623-0
OnExit JobStatus=C set=C
VU0EM005-dir: job.c:1126-0 wstorage=Neo4100
VU0EM005-dir: job.c:1135-0 wstore=Neo4100 where=Job resource
VU0EM005-dir: jcr.c:603-0 OnEntry JobStatus=C set=R
VU0EM005-dir: jcr.c:623-0 OnExit JobStatus=R set=R
VU0EM005-dir: cram-md5.c:73-0 send: auth cram-md5
<[EMAIL PROTECTED]> ssl=0
VU0EM005-dir: cram-md5.c:133-0 cram-get received: auth cram-md5
<[EMAIL PROTECTED]> ssl=0
VU0EM005-dir: cram-md5.c:152-0 sending resp to challenge: xxxxxx
VU0EM005-dir: ua_dotcmds.c:128-0 Cmd: .status dir current
VU0EM005-dir: ua_status.c:64-0 status:.status dir current
VU0EM005-dir: getmsg.c:110-0 bget_dirmsg -1: VU0EM005-dir:
getmsg.c:110-0 bget_dirmsg -1: VU0EM005-dir: getmsg.c:110-0
bget_dirmsg 204: Jmsg Job=VerifyVU0EM003-Archiv.2008-01-03_09.09.07
type=3 level=1199351336 VU0EM005-sd JobId 39: Fatal error: read.c:139
Error sending to File daemon. ERR=Die Wartezeit für die Verbindung ist
abgelaufen


It seems that the last sql query is making problems. 
But if I issue this query in bconsole I get an result.

Enter SQL query: SELECT FileId, LStat, MD5 FROM File WHERE
File.JobId=27 AND File.PathId=114554 AND File.FilenameId=477059;
+---------+--------------------------------------------------------------------+------------------------+
| fileid  | lstat                                                              
| md5                    |
+---------+--------------------------------------------------------------------+------------------------+
| 627,213 | P0C DACxYP IGw B MZq BOI A MrTc BAA Blg BDorWs BCpG16 BGn0BK A A C 
| Q2YsnU6biuCxjOzoGP2qrw |
+---------+--------------------------------------------------------------------+------------------------+

I've run 2 verify jobs with debug level 100 and it always fails at this point.

The verify job than fails with this error message:

03-Jan 10:08 VU0EM005-sd JobId 39: Fatal error: read.c:139 Error
sending to File daemon. ERR=Die Wartezeit für die Verbindung ist
abgelaufen
03-Jan 10:08 VU0EM005-sd JobId 39: Error: bsock.c:306 Write error
sending 65536 bytes to client:10.60.1.252:36643: ERR=Die Wartezeit für
die Verbindung ist abgelaufen

There is no firewall between the client and server and I have set some
heartbeat intervalls. This is really strange, because until now it
only happens with the extra psql db I created for the archive backups.
The regular psql db and the backup/verify jobs which use the other psql db are
ok.

Ralf

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to