Hi,

I have an installation that was previously using version 2.4.4 and was upgraded 
to 5.0.3 with good success. However, there was a previous problem with the 
SLES9 x64 clients that would intermittently fail the jobs due to bsock errors. 
So the error has carried forward from 2.4.4 to 5.0.3 and has not fixed the 
fault by upgrading to the latest. The errors would occur with a little as a few 
MB through to a couple of 100MB. The jobs could be full or incremental using 
either tape or disk pool. More often the backup will fail with only 40-50 MB 
backed up - so the job starts but fails quite quickly but waits for network 
timeout values to exceed to report the failure and cancelling the job.

The client base is all version 5.0.3 as is the SD and DIR services. We have 
approximately 80 clients where 90 % are SLES10 (x64 and a couple of 32 bit 
versions) and they work without error, 8% are Windows 2003 and work without 
error but the two SLES9 x64 builds have BOTH exhibited this intermittent 
network timeout issue. If I run a manual backup (instead of scheduled) during a 
non-maintenance window then the same client backs up correctly without error 
(tape and disk pool).

It has ONLY been the two SLES9 x64 platforms that have errored this way, all 
other clients do NOT error at all. I have checked default TCP timeout values 
which are all 7200 seconds but my feeling is that this is NOT the fault. This 
maybe a threading issue or a concurrency issue specific to Bacula (either 2.4.4 
or 5.0.3) with SLES9 x64 or this could even be a MTU issue but this doesn't 
explain why it works 50% of the time. The majority of the machines are virtual 
and as such share the physical hosts so networking shouldn't be an issue per-se.

I would appreciate any opinions or experience with this type of error as this 
is proving to be difficult to repeat manually.

Errors...

Error: bsock.c:393 Write error sending 65536 bytes to Storage 
daemon:mybaculaserver.mylocaldomain:9103: ERR=Broken pipe

Fatal error: backup.c:1024 Network send error to SD. ERR=Broken pipe




***Disclaimer***

This email and any attachments may contain confidential and/or privileged 
material; it is for the intended addressee(s) only. If you are not a named 
addressee, you must not use, retain or disclose such information. 

The Waterdale Group (3477263), comprising of CPiO Limited (2488682), Ardent 
Solutions Limited (4807328), eSpida Limited (4021203), Advanced Digital 
Technology Limited (1750478) and Intellisell (6070355) whilst taking all 
reasonable precautions against email or attachment viruses, cannot guarantee 
the safety of this email.

The views expressed in this email are those of the originator and do not 
necessarily represent the views of The Waterdale Group

Nothing in this email shall bind The Waterdale Group in any contract or 
obligation. A copy of the Waterdale Group of Companies'Conditions of Sale can 
be downloaded at www.waterdalegroup.co.uk or on any individual group member 
site.

For further information visit www.waterdalegroup.co.uk 
------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to