Holger Parplies wrote:

>> But lots of other people including myself run rsync without errors so it 
>> has to be something unique to your situation.
> 
> well, no. You don't rule out bugs by "it works for me", not even by "it
> works for everyone I know". I'm sure you know that.

Anything is possible I suppose, but if I know something works for 
everyone else I move it to the bottom of the list of things to test.

> We don't know much about the "lots of other people", do we? We know there
> have been no further *reports* of it on this list, but I don't remember
> hundreds of people reporting success with rsync on RHEL4 either. You might
> know about other lists, I don't.

I know enough about mailing lists to expect a ton of matches on a google 
search for the 'no route to host' problem but I don't see much relating 
to a local LAN or RHEL there.   And I'm sure I'd have seen mentions on 
the Centos or fedora lists if it affected those very similar kernels.

>> Maybe cables from a different vendor would help.
> 
> I doubt it, because other applications are doing well. It doesn't seem to be
> hardware related to me. I suspect the kernel on the host side (backup client)
> or its configuration. Of course, it may be hardware specific in that
> different hardware does not trigger whatever is happening (and that could
> include the switch, maybe, perhaps), but the cables? It's not the hardware
> where I would start looking, especially after Tim *has* tested quite a lot
> of different setups.

TCP retries can cover a lot of errors.  A bad crimp on a patch cable or 
an extra half-inch untwisted on the wall punch-downs can cause exactly 
this sort of thing.

> It could be stupid things like arp poisoning, a misbehaving machine on the
> local network or whatever. Remains the question what communication
> characteristics rsync has and SMB doesn't (hmm, SMB is UDP, isn't it?) that
> make the problem appear.

Arp poisoning is possible - maybe just someone plugging/unplugging a 
machine with the same IP somewhere else.  A badly configured NAT gateway 
  on the network doing proxy-arp the wrong direction could do it. If
there are multiple interconnected switches involved it could even be a 
loop with a spanning-tree problem.


> Tim sent me his /etc/sysctl.conf off-list, and I find it harmless (that
> refered to "kernel configuration" before I added the previous paragraph). As I
> understand him, he's about to try out different kernels (2.4.x ?), now that
> he has a test setup available. Swapping kernels is *not* something I'd happily
> do without further thought on a production server either, and I'm sure you
> agree.

If the machine can be taken down for testing, I'd boot a knoppix or 
ubuntu CD instead of installing something different.

> May I summarize a few points I believe we all agree on?
> 
> 1.) It's a client side problem, i.e. the backed up client seems to be the
>     cause, not the BackupPC server machine.

I'm guessing a network problem with the only likely software connection 
being the NIC driver.

> 2.) It is thus not a BackupPC problem. On the client only stock RHEL4
>     software is in use (on the test setup anyway).
> 3.) It is still on-topic in that it happens using BackupPC and only then.
>     Other users of BackupPC may run into similar problems and be glad to
>     find a solution in the archives once we find one.
> 4.) It's an obscure and unnerving problem. There are many things to try out,
>     nothing obvious springing to mind, and each of us has different thoughts
>     on what to try in which order :).
> 
> My bet stays the kernel. Craig has a point with the isolated network. Either
> one might fix it, without leading to a definitive diagnose. Running on an
> isolated network as a workaround is not an option :-), but it's the easier
> thing to try out, and *reproducing* the problem on an isolated network would
> rule out quite a lot of causes.

I think the kernel is the least likely thing to be involved.  My test 
procedure would be to build a 'known working' pair of machines, perhaps 
as simple as booting 2 boxes with knoppix or ubuntu with a crossover 
cable using ip addresses between them.  Once you get a set that can 
rsync without errors (which can't be that hard - it works for everyone 
else), start introducing the cable/switch/destination where you've seen 
the problem, one piece at a time.  Intel 100M NICs would probably be a 
safe bet for ruling out obscure driver issues.  If the switches are 
managed, I'd see what they say about interface errors and that you don't 
have a duplex mismatch on the connections.  A tcpdump of broadcasts to 
see the arp traffic might show something interesting.

-- 
   Les Mikesell
    [EMAIL PROTECTED]

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Reply via email to