Hi,

22.10.2007 21:26,, GDS.Marshall wrote::
> version 2.2.4 patched from sourceforge
> Linux kernel 2.6.x
> 
> I am running 10+ FD's, one SD, and one Director.  I am having problems
> with one of my FD's, the others are fine.  Not sure if it makes any
> difference, but the FD is on the same machine as the Director.

> I have no issues with the network, I see no errors on either the interface
> of the FD or the SD.  All FD's are plugged into the same netgear switch. 
> The SD is plugged into a different netgear switch which is then plugged
> into the FD's switch.

Are the FD and SD running on the same host (your description says that 
DIR and problem FD are on the same machine, but not if the DIR and SD 
are on that same machine, too)?

> I run a backup job (or via schedule) and the amount/size/volume of data is
> transfered each time, and then everything stops/hangs/does nothing.
> 
> ls -l
> /var/data/bacula/spool/backupserver-sd.data.472.fileserver-backup.2007-10-22_18.54.33.DLT-V4.spool
> -rw-r----- 1 root bacula 2193816 Oct 22 18:56
> 
> A short while later, I will get a console message
> 22-Oct 18:56 backupserver-sd: 3301 Issuing autochanger "loaded? drive 0"
> command.
> 22-Oct 18:56 backupserver-sd: 3302 Autochanger "loaded? drive 0", result
> is Slot 3.
> 22-Oct 18:56 backupserver-sd: Volume "CNI906" previously written, moving
> to end of data.
> 22-Oct 18:56 backupserver-sd: Ready to append to end of Volume "CNI906" at
> file=1.
> 22-Oct 18:56 backupserver-sd: Spooling data ...
> 22-Oct 18:56 fileserver-fd: fileserver-backup.2007-10-22_18.54.33 Fatal
> error: backup.c:892 Network send error to SD. ERR=Success

So the connection breaks shortly after data starts being transferred, 
right?

It's a little bit surprising to see an error text of Success here... I 
always thought that sort of things only happened on windows ;-)


> I know it says "Network send error", however, I have checked the network,
> and can not find a problem with any of the equipment.

Do you have a firewall running on that host?

> I have run the fd and sd with debug options to provide additional output,
> I hope this helps.
> 
> If any other information would help in diagnosis, please just ask for it.
> 
> 
> /usr/local/sbin/bacula-fd -f -s -d 200 -u root -g bacula -c
> /etc/bacula/bacula-fd.conf
> 
> /home/spencer/bacula-sd -f -d 200 -s -u root -g bacula -c
> /etc/bacula/bacula-sd.conf
> 
> cat /root/bacula-fd.log
> bacula-fd: filed_conf.c:438 Inserting director res: fileserver-mon
> fileserver-fd: jcr.c:132 read_last_jobs seek to 188
> fileserver-fd: jcr.c:139 Read num_items=10
> fileserver-fd: pythonlib.c:113 No script dir. prog=FDStartUp
> fileserver-fd: filed.c:225 filed: listening on port 9102
> fileserver-fd: bnet_server.c:96 Addresses host[ipv4:0.0.0.0:9102]
> fileserver-fd: bnet.c:666 who=client host=192.168.1.30 port=36387
> fileserver-fd: jcr.c:602 OnEntry JobStatus=fileserver-fd: jcr.c:622 OnExit
> JobStatus=C set=C
> fileserver-fd: find.c:81 init_find_files ff=8094e60
> fileserver-fd: job.c:233 <dird: Hello Director fileserver-dir calling
> fileserver-fd: job.c:249 Executing Hello command.
> fileserver-fd: job.c:353 Calling Authenticate
> fileserver-fd: cram-md5.c:71 send: auth cram-md5
> <[EMAIL PROTECTED]> ssl=0
> fileserver-fd: cram-md5.c:131 cram-get: auth cram-md5
> <[EMAIL PROTECTED]> ssl=0
> fileserver-fd: cram-md5.c:150 sending resp to challenge:
> 6U+ZK4lCcB/uXh+k+X/qdB
> fileserver-fd: job.c:357 OK Authenticate
> fileserver-fd: job.c:233 <dird: JobId=0 Job=-Console-.2007-10-22_18.53.31
> SDid=0 SDtime=0 Authorization=dummy
> fileserver-fd: job.c:249 Executing JobId= command.
> fileserver-fd: job.c:451 JobId=0 Auth=dummy
> fileserver-fd: job.c:233 <dird: statusfileserver-fd: job.c:249 Executing
> status command.
> fileserver-fd: runscript.c:102 runscript: running all RUNSCRIPT object
> (ClientAfterJob) JobStatus=C
> fileserver-fd: pythonlib.c:237 No startup module.
> fileserver-fd: job.c:337 Calling term_find_files
> fileserver-fd: job.c:340 Done with term_find_files
> fileserver-fd: mem_pool.c:377 garbage collect memory pool
> fileserver-fd: job.c:342 Done with free_jcr
> fileserver-fd: bnet.c:666 who=client host=192.168.1.30 port=36387
> fileserver-fd: jcr.c:602 OnEntry JobStatus=fileserver-fd: jcr.c:622 OnExit
> JobStatus=C set=C
> fileserver-fd: find.c:81 init_find_files ff=8094e60
> fileserver-fd: job.c:233 <dird: Hello Director fileserver-dir calling
> fileserver-fd: job.c:249 Executing Hello command.
> fileserver-fd: job.c:353 Calling Authenticate
> fileserver-fd: cram-md5.c:71 send: auth cram-md5
> <[EMAIL PROTECTED]> ssl=0
> fileserver-fd: cram-md5.c:131 cram-get: auth cram-md5
> <[EMAIL PROTECTED]> ssl=0
> fileserver-fd: cram-md5.c:150 sending resp to challenge:
> /2gP/C+Fx+ZhT98YS4+hzD
> fileserver-fd: job.c:357 OK Authenticate
> fileserver-fd: job.c:233 <dird: JobId=472
> Job=fileserver-backup.2007-10-22_18.54.33 SDid=1 SDtime=1193079020
> Authorization=JMKD-GKGP-LMNP-ODNP-EDME-NJBC-LEMK-BPGJ
> fileserver-fd: job.c:249 Executing JobId= command.
> fileserver-fd: job.c:451 JobId=472
> Auth=JMKD-GKGP-LMNP-ODNP-EDME-NJBC-LEMK-BPGJ
> fileserver-fd: job.c:233 <dird: fileset vss=1
> fileserver-fd: job.c:249 Executing fileset command.
> fileserver-fd: job.c:688 I
> fileserver-fd: job.c:688 O M0
> fileserver-fd: job.c:688 N
> fileserver-fd: job.c:688 F /boot
> fileserver-fd: job.c:688 F /etc
> fileserver-fd: job.c:688 F /usr/local
> fileserver-fd: job.c:688 F /var/lib
> fileserver-fd: job.c:688 F /var/data/svn
> fileserver-fd: job.c:688 F /var/data/mysql-bak
> fileserver-fd: job.c:688 F /var/spool/cyrus
> fileserver-fd: job.c:688 F /usr/src
> fileserver-fd: job.c:688 F /var/mail
> fileserver-fd: job.c:688 F /home/zak/mail
> fileserver-fd: job.c:688 F /home/james/mail
> fileserver-fd: job.c:688 F /home/dpar2/mail
> fileserver-fd: job.c:688 N
> fileserver-fd: job.c:688 E
> fileserver-fd: job.c:688 F /tmp
> fileserver-fd: job.c:688 N
> fileserver-fd: job.c:233 <dird: level = incremental  mtime_only=0
> fileserver-fd: job.c:249 Executing level =  command.
> fileserver-fd: job.c:1160 level_cmd: level = incremental  mtime_only=0
> fileserver-fd: job.c:233 <dird: level = since_utime 1192958180 mtime_only=0
> fileserver-fd: job.c:249 Executing level =  command.
> fileserver-fd: job.c:1160 level_cmd: level = since_utime 1192958180
> mtime_only=0
> fileserver-fd: job.c:1194 since_time=1192958180
> fileserver-fd: job.c:1215 Dirtime=1193079276315533 FDtime=1193079276315510
> fileserver-fd: job.c:1217 rt=56 adj=18446744073709551565
> fileserver-fd: job.c:1215 Dirtime=1193079276315628 FDtime=1193079276315603
> fileserver-fd: job.c:1217 rt=59 adj=18446744073709551511
> fileserver-fd: job.c:1215 Dirtime=1193079276315776 FDtime=1193079276315747
> fileserver-fd: job.c:1217 rt=64 adj=18446744073709551450
> fileserver-fd: job.c:1215 Dirtime=1193079276315872 FDtime=1193079276315847
> fileserver-fd: job.c:1217 rt=58 adj=18446744073709551396
> fileserver-fd: job.c:1215 Dirtime=1193079276315964 FDtime=1193079276315940
> fileserver-fd: job.c:1217 rt=58 adj=18446744073709551343
> fileserver-fd: job.c:1215 Dirtime=1193079276316057 FDtime=1193079276316032
> fileserver-fd: job.c:1217 rt=59 adj=18446744073709551289
> fileserver-fd: job.c:1215 Dirtime=1193079276316149 FDtime=1193079276316125
> fileserver-fd: job.c:1217 rt=58 adj=18446744073709551236
> fileserver-fd: job.c:1215 Dirtime=1193079276316242 FDtime=1193079276316218
> fileserver-fd: job.c:1217 rt=58 adj=18446744073709551183
> fileserver-fd: job.c:1221 rt=58 adj=18446744073709551562
> fileserver-fd: job.c:1236 adj = 0 since_time=1192958180
> fileserver-fd: job.c:233 <dird: storage address=backupserver.cluster.local
> port=9103 ssl=0
> fileserver-fd: job.c:249 Executing storage  command.
> fileserver-fd: job.c:1291 StorageCmd: storage
> address=backupserver.cluster.local port=9103 ssl=0
> fileserver-fd: job.c:1297 Open storage: backupserver.cluster.local:9103 ssl=0
> fileserver-fd: bsock.c:195 Current host[ipv4:192.168.1.135:9103] All
> host[ipv4:192.168.1.135:9103]
> fileserver-fd: bsock.c:149 who=Storage daemon
> host=backupserver.cluster.local port=9103
> fileserver-fd: job.c:1309 Connection OK to SD.
> fileserver-fd: cram-md5.c:131 cram-get: auth cram-md5
> <[EMAIL PROTECTED]> ssl=0
> fileserver-fd: cram-md5.c:150 sending resp to challenge:
> XB+VsUQJ44JUUT+eURwxEB
> fileserver-fd: cram-md5.c:78 send: auth cram-md5
> <[EMAIL PROTECTED]> ssl=0
> fileserver-fd: cram-md5.c:97 Authenticate OK 94ldXX/Ic5FIJUAeTm/qKA
> fileserver-fd: job.c:1318 Authenticated with SD.
> fileserver-fd: job.c:233 <dird: RunBeforeJob
> /etc/bacula/scripts/makemysqlbackup
> fileserver-fd: job.c:249 Executing RunBeforeJob command.
> fileserver-fd: job.c:462 runbefore_cmd: RunBeforeJob
> /etc/bacula/scripts/makemysqlbackup
> fileserver-fd: runscript.c:204 runscript: running a RUNSCRIPT object
> fileserver-fd: util.c:580 edit_job_codes: /etc/bacula/scripts/makemysqlbackup
> fileserver-fd: runscript.c:211 runscript: running
> '/etc/bacula/scripts/makemysqlbackup'...
> fileserver-fd: runscript.c:236 runscript OK
> fileserver-fd: job.c:233 <dird: backup
> fileserver-fd: job.c:249 Executing backup command.
> fileserver-fd: jcr.c:602 OnEntry JobStatus=C set=B
> fileserver-fd: jcr.c:622 OnExit JobStatus=B set=B
> fileserver-fd: job.c:1350 begin backup ff=8094e60
> fileserver-fd: job.c:1358 bfiled>dird: 2000 OK backup
> fileserver-fd: job.c:1364 >stored: append open session
> fileserver-fd: job.c:1369 <stored: 3000 OK open ticket = 1
> fileserver-fd: job.c:1374 Got Ticket=1
> fileserver-fd: job.c:1384 >stored: append data 1
> fileserver-fd: job.c:1389 <stored: append data 1
> fileserver-fd: job.c:1798 3000 OK data
> fileserver-fd: pythonlib.c:237 No startup module.
> fileserver-fd: job.c:1436 begin blast ff=8094e60
> fileserver-fd: jcr.c:602 OnEntry JobStatus=B set=R
> fileserver-fd: jcr.c:622 OnExit JobStatus=R set=R
> fileserver-fd: find.c:93 Enter set_find_options()
> fileserver-fd: find.c:96 Leave set_find_options()
> fileserver-fd: find.c:198 F /boot
> fileserver-fd: backup.c:278 FT_DIREND: /boot/lost+found/
> fileserver-fd: backup.c:332 bfiled: sending /boot/lost+found to stored
> fileserver-fd: backup.c:1137 No strip for /boot/lost+found
> fileserver-fd: backup.c:245 FT_REG saving: /boot/map
> fileserver-fd: backup.c:332 bfiled: sending /boot/map to stored
> fileserver-fd: backup.c:1137 No strip for /boot/map
> fileserver-fd: backup.c:895 Send data to SD len=80384
> fileserver-fd: backup.c:245 FT_REG saving: /boot/config-2.4.30
> fileserver-fd: backup.c:332 bfiled: sending /boot/config-2.4.30 to stored
> fileserver-fd: backup.c:1137 No strip for /boot/config-2.4.30
> fileserver-fd: backup.c:895 Send data to SD len=23184
> fileserver-fd: backup.c:245 FT_REG saving: /boot/config-2.4.32
> fileserver-fd: backup.c:332 bfiled: sending /boot/config-2.4.32 to stored
> fileserver-fd: backup.c:1137 No strip for /boot/config-2.4.32
> fileserver-fd: backup.c:895 Send data to SD len=23938
> fileserver-fd: backup.c:245 FT_REG saving: /boot/System.map-2.4.30
> fileserver-fd: backup.c:332 bfiled: sending /boot/System.map-2.4.30 to stored
> fileserver-fd: backup.c:1137 No strip for /boot/System.map-2.4.30
> fileserver-fd: backup.c:895 Send data to SD len=647691
> fileserver-fd: backup.c:245 FT_REG saving: /boot/System.map-2.4.32
> fileserver-fd: backup.c:332 bfiled: sending /boot/System.map-2.4.32 to stored
> fileserver-fd: backup.c:1137 No strip for /boot/System.map-2.4.32
> fileserver-fd: backup.c:895 Send data to SD len=647760
> fileserver-fd: backup.c:245 FT_REG saving: /boot/System.map-2.6.16.1-2
> fileserver-fd: backup.c:332 bfiled: sending /boot/System.map-2.6.16.1-2 to
> stored
> fileserver-fd: backup.c:1137 No strip for /boot/System.map-2.6.16.1-2
> fileserver-fd: backup.c:895 Send data to SD len=742070
> fileserver-fd: backup.c:245 FT_REG saving: /boot/config-2.6.16.1-2
> fileserver-fd: backup.c:332 bfiled: sending /boot/config-2.6.16.1-2 to stored
> fileserver-fd: backup.c:1137 No strip for /boot/config-2.6.16.1-2
> fileserver-fd: backup.c:895 Send data to SD len=31119
> fileserver-fd: backup.c:245 FT_REG saving: /boot/vmlinuz-2.6.16.1-2
> fileserver-fd: backup.c:332 bfiled: sending /boot/vmlinuz-2.6.16.1-2 to
> stored
> fileserver-fd: backup.c:1137 No strip for /boot/vmlinuz-2.6.16.1-2
> fileserver-fd: jcr.c:602 OnEntry JobStatus=R set=f
> fileserver-fd: jcr.c:622 OnExit JobStatus=f set=f
> fileserver-fd: jcr.c:602 OnEntry JobStatus=f set=E
> fileserver-fd: jcr.c:622 OnExit JobStatus=f set=E
> fileserver-fd: backup.c:197 end blast_data ok=0

Here it's failed, I think. A higher debug level might reveal more, but 
this doesn't tell me anything important.

> fileserver-fd: jcr.c:602 OnEntry JobStatus=f set=E
> fileserver-fd: jcr.c:622 OnExit JobStatus=f set=E
> fileserver-fd: bnet.c:666 who=client host=192.168.1.30 port=36387
> fileserver-fd: jcr.c:602 OnEntry JobStatus=fileserver-fd: jcr.c:622 OnExit
> JobStatus=C set=C
> fileserver-fd: find.c:81 init_find_files ff=8097838
...

> 
> cat /root/bacula-sd.log
...
> backupserver-sd: jcr.c:602 OnEntry JobStatus=R set=R
> backupserver-sd: jcr.c:622 OnExit JobStatus=R set=R
> backupserver-sd: append.c:96 Begin append device="DLT-V4" (/dev/nst0)
> backupserver-sd: spool.c:110 Turning on data spooling
> backupserver-sd: spool.c:179 Created spool file:
> /var/data/amanda/bacula/spool/backupserver-sd.data.472.fileserver-backup.2007-10-22_18.54.33.DLT-V4.spool
> backupserver-sd: append.c:101 Just after acquire_device_for_append
> backupserver-sd: label.c:698 session_label record=80cc220
> backupserver-sd: label.c:754 Write sesson_label record JobId=472
> FI=SOS_LABEL SessId=1 Strm=472 len=179 remainder=0
> backupserver-sd: label.c:758 Leave write_session_label Block=0d File=1d
> backupserver-sd: bnet.c:666 who=client host=192.168.1.30 port=36643

The session has started, it's label is written to tape.

> backupserver-sd: dircmd.c:171 Conn: Hello Director fileserver-dir calling
> backupserver-sd: dircmd.c:181 Got a DIR connection
> backupserver-sd: jcr.c:602 OnEntry JobStatus=backupserver-sd: jcr.c:622
> OnExit JobStatus=C set=C
> backupserver-sd: cram-md5.c:71 send: auth cram-md5
> <[EMAIL PROTECTED]> ssl=0
> backupserver-sd: cram-md5.c:131 cram-get: auth cram-md5
> <[EMAIL PROTECTED]> ssl=0
> backupserver-sd: cram-md5.c:150 sending resp to challenge:
> p8/qo/5jVDAXL//8+/4bbD
> backupserver-sd: dircmd.c:202 Message channel init completed.
> backupserver-sd: dircmd.c:209 <dird: status
> backupserver-sd: dircmd.c:223 Do command: status
> backupserver-sd: pythonlib.c:237 No startup module.
> backupserver-sd: mem_pool.c:377 garbage collect memory pool

No more output related to the actual job, I think. Again, a higher 
debug level might reveal something.


> backupserver ~ #

With the information from above, I suspect a network problem. Does the 
client run before job you have run for a very long time? In such a 
situation, a firewall/router might close the connection between SD and 
FD because it seems to be idle.

Arno


> Many thanks
> 
> Spencer
> 
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to