Hi,

24.10.2007 12:33,, GDS.Marshall wrote::
> Hello,
> 
>> Hi,
>>
>> 22.10.2007 21:26,, GDS.Marshall wrote::
>>> version 2.2.4 patched from sourceforge
>>> Linux kernel 2.6.x
>>>
>>> I am running 10+ FD's, one SD, and one Director.  I am having problems
>>> with one of my FD's, the others are fine.  Not sure if it makes any
>>> difference, but the FD is on the same machine as the Director.
>>> I have no issues with the network, I see no errors on either the
>>> interface
>>> of the FD or the SD.  All FD's are plugged into the same netgear switch.
>>> The SD is plugged into a different netgear switch which is then plugged
>>> into the FD's switch.
>> Are the FD and SD running on the same host (your description says that
>> DIR and problem FD are on the same machine, but not if the DIR and SD
>> are on that same machine, too)?
> No, the SD is on its own machine
> 
> FD+DIR   FD   FD
>   |      |     |
>  GSW---------------.... Gig Switch
>   |
>  FSW---------------.... Fast Switch
>   |
>   SD

And the problem connection is between the hosts to the left... ok.

...
>>> 22-Oct 18:56 backupserver-sd: Spooling data ...
>>> 22-Oct 18:56 fileserver-fd: fileserver-backup.2007-10-22_18.54.33 Fatal
>>> error: backup.c:892 Network send error to SD. ERR=Success
>> So the connection breaks shortly after data starts being transferred,
>> right?
> Correct, 2193816 is always written.

Funny. Disk full on the SD, perhaps? Might be worth a look into the 
system log on both the machines.

>> It's a little bit surprising to see an error text of Success here... I
>> always thought that sort of things only happened on windows ;-)
> ROTFL.  The FD, Dir, SD are on linux machines, we have not ventured to the
> Windows FD yet.
> 
>>
>>> I know it says "Network send error", however, I have checked the
>>> network,
>>> and can not find a problem with any of the equipment.
>> Do you have a firewall running on that host?
> No firewalls running on any of the bacula hosts, and the switch is not a
> 3com.

Good enough... regarding network problems, you could try to enable the 
heartbeat function in the FD and / or SD. To find the cause of the 
problem, tcpdump or wireshark might help.

If you see RST packages on the connection between FD and SD it's only 
the question who generates them...

...
>> Here it's failed, I think. A higher debug level might reveal more, but
>> this doesn't tell me anything important.
> 
> I am probably going to get flamed for this,

Not by me :-)

> but what value, currently it
> is set to 200, I do not want to put it too high, and swamp the amount of
> data I am supplying the mailing list, but neither do I want to waste the
> mailing lists time by making it too low....

Really a difficult question :-)

The best approach might be to run with debug level 400, save the 
resulting logs, and only post the part around the failure first. If 
someone needs more detail, you could post the complete log to a web site.

...
>>> backupserver ~ #
>> With the information from above, I suspect a network problem. Does the
>> client run before job you have run for a very long time? In such a
>> situation, a firewall/router might close the connection between SD and
>> FD because it seems to be idle.
> The run before job might take half an hour max.  There is no firewall or
> router in the setup.

Hmm... half an hour should not trigger a RST due to idleing too long. 
Do your other FDs on the network segment with the DIR have 
long-running scripts, too, or do they transfer data almost immediately 
after the backup jobs are started?

Arno

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to