Hi,
I have an installation that was previously using version 2.4.4 and was upgraded
to 5.0.3 with good success. However, there was a previous problem with the
SLES9 x64 clients that would intermittently fail the jobs due to bsock errors.
So the error has carried forward from 2.4.4 to 5.0.3 and has not fixed the
fault by upgrading to the latest. The errors would occur with a little as a few
MB through to a couple of 100MB. The jobs could be full or incremental using
either tape or disk pool. More often the backup will fail with only 40-50 MB
backed up - so the job starts but fails quite quickly but waits for network
timeout values to exceed to report the failure and cancelling the job.
The client base is all version 5.0.3 as is the SD and DIR services. We have
approximately 80 clients where 90 % are SLES10 (x64 and a couple of 32 bit
versions) and they work without error, 8% are Windows 2003 and work without
error but the two SLES9 x64 builds have BOTH exhibited this intermittent
network timeout issue. If I run a manual backup (instead of scheduled) during a
non-maintenance window then the same client backs up correctly without error
(tape and disk pool).
It has ONLY been the two SLES9 x64 platforms that have errored this way, all
other clients do NOT error at all. I have checked default TCP timeout values
which are all 7200 seconds but my feeling is that this is NOT the fault. This
maybe a threading issue or a concurrency issue specific to Bacula (either 2.4.4
or 5.0.3) with SLES9 x64 or this could even be a MTU issue but this doesn't
explain why it works 50% of the time. The majority of the machines are virtual
and as such share the physical hosts so networking shouldn't be an issue per-se.
I would appreciate any opinions or experience with this type of error as this
is proving to be difficult to repeat manually.
Errors...
Error: bsock.c:393 Write error sending 65536 bytes to Storage
daemon:mybaculaserver.mylocaldomain:9103: ERR=Broken pipe
Fatal error: backup.c:1024 Network send error to SD. ERR=Broken pipe
***Disclaimer***
This email and any attachments may contain confidential and/or privileged
material; it is for the intended addressee(s) only. If you are not a named
addressee, you must not use, retain or disclose such information.
The Waterdale Group (3477263), comprising of CPiO Limited (2488682), Ardent
Solutions Limited (4807328), eSpida Limited (4021203), Advanced Digital
Technology Limited (1750478) and Intellisell (6070355) whilst taking all
reasonable precautions against email or attachment viruses, cannot guarantee
the safety of this email.
The views expressed in this email are those of the originator and do not
necessarily represent the views of The Waterdale Group
Nothing in this email shall bind The Waterdale Group in any contract or
obligation. A copy of the Waterdale Group of Companies'Conditions of Sale can
be downloaded at www.waterdalegroup.co.uk or on any individual group member
site.
For further information visit www.waterdalegroup.co.uk
------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users