Hello,

> Hi,
>
> 22.10.2007 21:26,, GDS.Marshall wrote::
>> version 2.2.4 patched from sourceforge
>> Linux kernel 2.6.x
>>
>> I am running 10+ FD's, one SD, and one Director.  I am having problems
>> with one of my FD's, the others are fine.  Not sure if it makes any
>> difference, but the FD is on the same machine as the Director.
>
>> I have no issues with the network, I see no errors on either the
>> interface
>> of the FD or the SD.  All FD's are plugged into the same netgear switch.
>> The SD is plugged into a different netgear switch which is then plugged
>> into the FD's switch.
>
> Are the FD and SD running on the same host (your description says that
> DIR and problem FD are on the same machine, but not if the DIR and SD
> are on that same machine, too)?
No, the SD is on its own machine

FD+DIR   FD   FD
  |      |     |
 GSW---------------.... Gig Switch
  |
 FSW---------------.... Fast Switch
  |
  SD

>
>> I run a backup job (or via schedule) and the amount/size/volume of data
>> is
>> transfered each time, and then everything stops/hangs/does nothing.
>>
>> ls -l
>> /var/data/bacula/spool/backupserver-sd.data.472.fileserver-backup.2007-10-22_18.54.33.DLT-V4.spool
>> -rw-r----- 1 root bacula 2193816 Oct 22 18:56
>>
>> A short while later, I will get a console message
>> 22-Oct 18:56 backupserver-sd: 3301 Issuing autochanger "loaded? drive 0"
>> command.
>> 22-Oct 18:56 backupserver-sd: 3302 Autochanger "loaded? drive 0", result
>> is Slot 3.
>> 22-Oct 18:56 backupserver-sd: Volume "CNI906" previously written, moving
>> to end of data.
>> 22-Oct 18:56 backupserver-sd: Ready to append to end of Volume "CNI906"
>> at
>> file=1.
>> 22-Oct 18:56 backupserver-sd: Spooling data ...
>> 22-Oct 18:56 fileserver-fd: fileserver-backup.2007-10-22_18.54.33 Fatal
>> error: backup.c:892 Network send error to SD. ERR=Success
>
> So the connection breaks shortly after data starts being transferred,
> right?
Correct, 2193816 is always written.

>
> It's a little bit surprising to see an error text of Success here... I
> always thought that sort of things only happened on windows ;-)
ROTFL.  The FD, Dir, SD are on linux machines, we have not ventured to the
Windows FD yet.

>
>
>> I know it says "Network send error", however, I have checked the
>> network,
>> and can not find a problem with any of the equipment.
>
> Do you have a firewall running on that host?
No firewalls running on any of the bacula hosts, and the switch is not a
3com.

>
>> I have run the fd and sd with debug options to provide additional
>> output,
>> I hope this helps.
>>
>> If any other information would help in diagnosis, please just ask for
>> it.
>>
>>
>> /usr/local/sbin/bacula-fd -f -s -d 200 -u root -g bacula -c
>> /etc/bacula/bacula-fd.conf
>>
>> /home/spencer/bacula-sd -f -d 200 -s -u root -g bacula -c
>> /etc/bacula/bacula-sd.conf
>>
>> cat /root/bacula-fd.log
>> bacula-fd: filed_conf.c:438 Inserting director res: fileserver-mon
>> fileserver-fd: jcr.c:132 read_last_jobs seek to 188
>> fileserver-fd: jcr.c:139 Read num_items=10
>> fileserver-fd: pythonlib.c:113 No script dir. prog=FDStartUp
>> fileserver-fd: filed.c:225 filed: listening on port 9102
>> fileserver-fd: bnet_server.c:96 Addresses host[ipv4:0.0.0.0:9102]
>> fileserver-fd: bnet.c:666 who=client host=192.168.1.30 port=36387
>> fileserver-fd: jcr.c:602 OnEntry JobStatus=fileserver-fd: jcr.c:622
>> OnExit
>> JobStatus=C set=C
>> fileserver-fd: find.c:81 init_find_files ff=8094e60
>> fileserver-fd: job.c:233 <dird: Hello Director fileserver-dir calling
>> fileserver-fd: job.c:249 Executing Hello command.
>> fileserver-fd: job.c:353 Calling Authenticate
>> fileserver-fd: cram-md5.c:71 send: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> fileserver-fd: cram-md5.c:131 cram-get: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> fileserver-fd: cram-md5.c:150 sending resp to challenge:
>> 6U+ZK4lCcB/uXh+k+X/qdB
>> fileserver-fd: job.c:357 OK Authenticate
>> fileserver-fd: job.c:233 <dird: JobId=0
>> Job=-Console-.2007-10-22_18.53.31
>> SDid=0 SDtime=0 Authorization=dummy
>> fileserver-fd: job.c:249 Executing JobId= command.
>> fileserver-fd: job.c:451 JobId=0 Auth=dummy
>> fileserver-fd: job.c:233 <dird: statusfileserver-fd: job.c:249 Executing
>> status command.
>> fileserver-fd: runscript.c:102 runscript: running all RUNSCRIPT object
>> (ClientAfterJob) JobStatus=C
>> fileserver-fd: pythonlib.c:237 No startup module.
>> fileserver-fd: job.c:337 Calling term_find_files
>> fileserver-fd: job.c:340 Done with term_find_files
>> fileserver-fd: mem_pool.c:377 garbage collect memory pool
>> fileserver-fd: job.c:342 Done with free_jcr
>> fileserver-fd: bnet.c:666 who=client host=192.168.1.30 port=36387
>> fileserver-fd: jcr.c:602 OnEntry JobStatus=fileserver-fd: jcr.c:622
>> OnExit
>> JobStatus=C set=C
>> fileserver-fd: find.c:81 init_find_files ff=8094e60
>> fileserver-fd: job.c:233 <dird: Hello Director fileserver-dir calling
>> fileserver-fd: job.c:249 Executing Hello command.
>> fileserver-fd: job.c:353 Calling Authenticate
>> fileserver-fd: cram-md5.c:71 send: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> fileserver-fd: cram-md5.c:131 cram-get: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> fileserver-fd: cram-md5.c:150 sending resp to challenge:
>> /2gP/C+Fx+ZhT98YS4+hzD
>> fileserver-fd: job.c:357 OK Authenticate
>> fileserver-fd: job.c:233 <dird: JobId=472
>> Job=fileserver-backup.2007-10-22_18.54.33 SDid=1 SDtime=1193079020
>> Authorization=JMKD-GKGP-LMNP-ODNP-EDME-NJBC-LEMK-BPGJ
>> fileserver-fd: job.c:249 Executing JobId= command.
>> fileserver-fd: job.c:451 JobId=472
>> Auth=JMKD-GKGP-LMNP-ODNP-EDME-NJBC-LEMK-BPGJ
>> fileserver-fd: job.c:233 <dird: fileset vss=1
>> fileserver-fd: job.c:249 Executing fileset command.
>> fileserver-fd: job.c:688 I
>> fileserver-fd: job.c:688 O M0
>> fileserver-fd: job.c:688 N
>> fileserver-fd: job.c:688 F /boot
>> fileserver-fd: job.c:688 F /etc
>> fileserver-fd: job.c:688 F /usr/local
>> fileserver-fd: job.c:688 F /var/lib
>> fileserver-fd: job.c:688 F /var/data/svn
>> fileserver-fd: job.c:688 F /var/data/mysql-bak
>> fileserver-fd: job.c:688 F /var/spool/cyrus
>> fileserver-fd: job.c:688 F /usr/src
>> fileserver-fd: job.c:688 F /var/mail
>> fileserver-fd: job.c:688 F /home/zak/mail
>> fileserver-fd: job.c:688 F /home/james/mail
>> fileserver-fd: job.c:688 F /home/dpar2/mail
>> fileserver-fd: job.c:688 N
>> fileserver-fd: job.c:688 E
>> fileserver-fd: job.c:688 F /tmp
>> fileserver-fd: job.c:688 N
>> fileserver-fd: job.c:233 <dird: level = incremental  mtime_only=0
>> fileserver-fd: job.c:249 Executing level =  command.
>> fileserver-fd: job.c:1160 level_cmd: level = incremental  mtime_only=0
>> fileserver-fd: job.c:233 <dird: level = since_utime 1192958180
>> mtime_only=0
>> fileserver-fd: job.c:249 Executing level =  command.
>> fileserver-fd: job.c:1160 level_cmd: level = since_utime 1192958180
>> mtime_only=0
>> fileserver-fd: job.c:1194 since_time=1192958180
>> fileserver-fd: job.c:1215 Dirtime=1193079276315533
>> FDtime=1193079276315510
>> fileserver-fd: job.c:1217 rt=56 adj=18446744073709551565
>> fileserver-fd: job.c:1215 Dirtime=1193079276315628
>> FDtime=1193079276315603
>> fileserver-fd: job.c:1217 rt=59 adj=18446744073709551511
>> fileserver-fd: job.c:1215 Dirtime=1193079276315776
>> FDtime=1193079276315747
>> fileserver-fd: job.c:1217 rt=64 adj=18446744073709551450
>> fileserver-fd: job.c:1215 Dirtime=1193079276315872
>> FDtime=1193079276315847
>> fileserver-fd: job.c:1217 rt=58 adj=18446744073709551396
>> fileserver-fd: job.c:1215 Dirtime=1193079276315964
>> FDtime=1193079276315940
>> fileserver-fd: job.c:1217 rt=58 adj=18446744073709551343
>> fileserver-fd: job.c:1215 Dirtime=1193079276316057
>> FDtime=1193079276316032
>> fileserver-fd: job.c:1217 rt=59 adj=18446744073709551289
>> fileserver-fd: job.c:1215 Dirtime=1193079276316149
>> FDtime=1193079276316125
>> fileserver-fd: job.c:1217 rt=58 adj=18446744073709551236
>> fileserver-fd: job.c:1215 Dirtime=1193079276316242
>> FDtime=1193079276316218
>> fileserver-fd: job.c:1217 rt=58 adj=18446744073709551183
>> fileserver-fd: job.c:1221 rt=58 adj=18446744073709551562
>> fileserver-fd: job.c:1236 adj = 0 since_time=1192958180
>> fileserver-fd: job.c:233 <dird: storage
>> address=backupserver.cluster.local
>> port=9103 ssl=0
>> fileserver-fd: job.c:249 Executing storage  command.
>> fileserver-fd: job.c:1291 StorageCmd: storage
>> address=backupserver.cluster.local port=9103 ssl=0
>> fileserver-fd: job.c:1297 Open storage: backupserver.cluster.local:9103
>> ssl=0
>> fileserver-fd: bsock.c:195 Current host[ipv4:192.168.1.135:9103] All
>> host[ipv4:192.168.1.135:9103]
>> fileserver-fd: bsock.c:149 who=Storage daemon
>> host=backupserver.cluster.local port=9103
>> fileserver-fd: job.c:1309 Connection OK to SD.
>> fileserver-fd: cram-md5.c:131 cram-get: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> fileserver-fd: cram-md5.c:150 sending resp to challenge:
>> XB+VsUQJ44JUUT+eURwxEB
>> fileserver-fd: cram-md5.c:78 send: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> fileserver-fd: cram-md5.c:97 Authenticate OK 94ldXX/Ic5FIJUAeTm/qKA
>> fileserver-fd: job.c:1318 Authenticated with SD.
>> fileserver-fd: job.c:233 <dird: RunBeforeJob
>> /etc/bacula/scripts/makemysqlbackup
>> fileserver-fd: job.c:249 Executing RunBeforeJob command.
>> fileserver-fd: job.c:462 runbefore_cmd: RunBeforeJob
>> /etc/bacula/scripts/makemysqlbackup
>> fileserver-fd: runscript.c:204 runscript: running a RUNSCRIPT object
>> fileserver-fd: util.c:580 edit_job_codes:
>> /etc/bacula/scripts/makemysqlbackup
>> fileserver-fd: runscript.c:211 runscript: running
>> '/etc/bacula/scripts/makemysqlbackup'...
>> fileserver-fd: runscript.c:236 runscript OK
>> fileserver-fd: job.c:233 <dird: backup
>> fileserver-fd: job.c:249 Executing backup command.
>> fileserver-fd: jcr.c:602 OnEntry JobStatus=C set=B
>> fileserver-fd: jcr.c:622 OnExit JobStatus=B set=B
>> fileserver-fd: job.c:1350 begin backup ff=8094e60
>> fileserver-fd: job.c:1358 bfiled>dird: 2000 OK backup
>> fileserver-fd: job.c:1364 >stored: append open session
>> fileserver-fd: job.c:1369 <stored: 3000 OK open ticket = 1
>> fileserver-fd: job.c:1374 Got Ticket=1
>> fileserver-fd: job.c:1384 >stored: append data 1
>> fileserver-fd: job.c:1389 <stored: append data 1
>> fileserver-fd: job.c:1798 3000 OK data
>> fileserver-fd: pythonlib.c:237 No startup module.
>> fileserver-fd: job.c:1436 begin blast ff=8094e60
>> fileserver-fd: jcr.c:602 OnEntry JobStatus=B set=R
>> fileserver-fd: jcr.c:622 OnExit JobStatus=R set=R
>> fileserver-fd: find.c:93 Enter set_find_options()
>> fileserver-fd: find.c:96 Leave set_find_options()
>> fileserver-fd: find.c:198 F /boot
>> fileserver-fd: backup.c:278 FT_DIREND: /boot/lost+found/
>> fileserver-fd: backup.c:332 bfiled: sending /boot/lost+found to stored
>> fileserver-fd: backup.c:1137 No strip for /boot/lost+found
>> fileserver-fd: backup.c:245 FT_REG saving: /boot/map
>> fileserver-fd: backup.c:332 bfiled: sending /boot/map to stored
>> fileserver-fd: backup.c:1137 No strip for /boot/map
>> fileserver-fd: backup.c:895 Send data to SD len=80384
>> fileserver-fd: backup.c:245 FT_REG saving: /boot/config-2.4.30
>> fileserver-fd: backup.c:332 bfiled: sending /boot/config-2.4.30 to
>> stored
>> fileserver-fd: backup.c:1137 No strip for /boot/config-2.4.30
>> fileserver-fd: backup.c:895 Send data to SD len=23184
>> fileserver-fd: backup.c:245 FT_REG saving: /boot/config-2.4.32
>> fileserver-fd: backup.c:332 bfiled: sending /boot/config-2.4.32 to
>> stored
>> fileserver-fd: backup.c:1137 No strip for /boot/config-2.4.32
>> fileserver-fd: backup.c:895 Send data to SD len=23938
>> fileserver-fd: backup.c:245 FT_REG saving: /boot/System.map-2.4.30
>> fileserver-fd: backup.c:332 bfiled: sending /boot/System.map-2.4.30 to
>> stored
>> fileserver-fd: backup.c:1137 No strip for /boot/System.map-2.4.30
>> fileserver-fd: backup.c:895 Send data to SD len=647691
>> fileserver-fd: backup.c:245 FT_REG saving: /boot/System.map-2.4.32
>> fileserver-fd: backup.c:332 bfiled: sending /boot/System.map-2.4.32 to
>> stored
>> fileserver-fd: backup.c:1137 No strip for /boot/System.map-2.4.32
>> fileserver-fd: backup.c:895 Send data to SD len=647760
>> fileserver-fd: backup.c:245 FT_REG saving: /boot/System.map-2.6.16.1-2
>> fileserver-fd: backup.c:332 bfiled: sending /boot/System.map-2.6.16.1-2
>> to
>> stored
>> fileserver-fd: backup.c:1137 No strip for /boot/System.map-2.6.16.1-2
>> fileserver-fd: backup.c:895 Send data to SD len=742070
>> fileserver-fd: backup.c:245 FT_REG saving: /boot/config-2.6.16.1-2
>> fileserver-fd: backup.c:332 bfiled: sending /boot/config-2.6.16.1-2 to
>> stored
>> fileserver-fd: backup.c:1137 No strip for /boot/config-2.6.16.1-2
>> fileserver-fd: backup.c:895 Send data to SD len=31119
>> fileserver-fd: backup.c:245 FT_REG saving: /boot/vmlinuz-2.6.16.1-2
>> fileserver-fd: backup.c:332 bfiled: sending /boot/vmlinuz-2.6.16.1-2 to
>> stored
>> fileserver-fd: backup.c:1137 No strip for /boot/vmlinuz-2.6.16.1-2
>> fileserver-fd: jcr.c:602 OnEntry JobStatus=R set=f
>> fileserver-fd: jcr.c:622 OnExit JobStatus=f set=f
>> fileserver-fd: jcr.c:602 OnEntry JobStatus=f set=E
>> fileserver-fd: jcr.c:622 OnExit JobStatus=f set=E
>> fileserver-fd: backup.c:197 end blast_data ok=0
>
> Here it's failed, I think. A higher debug level might reveal more, but
> this doesn't tell me anything important.

I am probably going to get flamed for this, but what value, currently it
is set to 200, I do not want to put it too high, and swamp the amount of
data I am supplying the mailing list, but neither do I want to waste the
mailing lists time by making it too low....

>
>> fileserver-fd: jcr.c:602 OnEntry JobStatus=f set=E
>> fileserver-fd: jcr.c:622 OnExit JobStatus=f set=E
>> fileserver-fd: bnet.c:666 who=client host=192.168.1.30 port=36387
>> fileserver-fd: jcr.c:602 OnEntry JobStatus=fileserver-fd: jcr.c:622
>> OnExit
>> JobStatus=C set=C
>> fileserver-fd: find.c:81 init_find_files ff=8097838
> ...
>
>>
>> cat /root/bacula-sd.log
> ...
>> backupserver-sd: jcr.c:602 OnEntry JobStatus=R set=R
>> backupserver-sd: jcr.c:622 OnExit JobStatus=R set=R
>> backupserver-sd: append.c:96 Begin append device="DLT-V4" (/dev/nst0)
>> backupserver-sd: spool.c:110 Turning on data spooling
>> backupserver-sd: spool.c:179 Created spool file:
>> /var/data/amanda/bacula/spool/backupserver-sd.data.472.fileserver-backup.2007-10-22_18.54.33.DLT-V4.spool
>> backupserver-sd: append.c:101 Just after acquire_device_for_append
>> backupserver-sd: label.c:698 session_label record=80cc220
>> backupserver-sd: label.c:754 Write sesson_label record JobId=472
>> FI=SOS_LABEL SessId=1 Strm=472 len=179 remainder=0
>> backupserver-sd: label.c:758 Leave write_session_label Block=0d File=1d
>> backupserver-sd: bnet.c:666 who=client host=192.168.1.30 port=36643
>
> The session has started, it's label is written to tape.
>
>> backupserver-sd: dircmd.c:171 Conn: Hello Director fileserver-dir
>> calling
>> backupserver-sd: dircmd.c:181 Got a DIR connection
>> backupserver-sd: jcr.c:602 OnEntry JobStatus=backupserver-sd: jcr.c:622
>> OnExit JobStatus=C set=C
>> backupserver-sd: cram-md5.c:71 send: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> backupserver-sd: cram-md5.c:131 cram-get: auth cram-md5
>> <[EMAIL PROTECTED]> ssl=0
>> backupserver-sd: cram-md5.c:150 sending resp to challenge:
>> p8/qo/5jVDAXL//8+/4bbD
>> backupserver-sd: dircmd.c:202 Message channel init completed.
>> backupserver-sd: dircmd.c:209 <dird: status
>> backupserver-sd: dircmd.c:223 Do command: status
>> backupserver-sd: pythonlib.c:237 No startup module.
>> backupserver-sd: mem_pool.c:377 garbage collect memory pool
>
> No more output related to the actual job, I think. Again, a higher
> debug level might reveal something.
>
>
>> backupserver ~ #
>
> With the information from above, I suspect a network problem. Does the
> client run before job you have run for a very long time? In such a
> situation, a firewall/router might close the connection between SD and
> FD because it seems to be idle.
The run before job might take half an hour max.  There is no firewall or
router in the setup.

>
> Arno


Many thanks

Spencer



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to