David,
Thank you for the assessment and update information.
*>> If you have ever used systemtap, or would like to try, I can provide you
with some detailed instructions so that we can monitor the behavior of our
drivers and the bonding driver, and see exactly what does./what doesn't
happen. If you have not used it before, I will work up instrumented e1000 &
e1000e driver modules which should help us to resolve the mystery.*
I've not used that tool before, I'd be happy to proceed as you suggest
regarding the driver analysis.
*>> I am now using a RHEL5.4 system, with base drivers for e1000 (eth1),
e1000e (eth2) and bonding (bond0)...*
Are those driver's the ones Bruce mentioned in his previous post that he'd
be soon making available (to fix the earlier build error)? If so, it would
probably be best I get those now built and tested too, first, before we
continue with systemtap.
Regards,
Ed
>Comment By: david graham (davegraham)
Date: 2009-10-06 21:16
Message:
Ed,
I am now using a RHEL5.4 system, with base drivers for e1000 (eth1),
e1000e (eth2) and bonding (bond0), and I think the same INTEL networking
silicon as you are , with eth1 and eth2 as bond0 slaves. I can repeatedley
and reliably disconnect and then reconnect network cables for eth1 and
eth2, maintaining a constant 'ping' connection to a remote server as long
as one of eth1 or eth2 is physically connected. Unfortunately I still fail
to see the problem that you are seeing. I attach a set of files all
prefixed drg_... which identifies my configuration.
I'm must be missing some critical difference in our system configurtation.
It may that we can discover it by careful correllation of the information
in the files we have attached (I will look again).
Its interesting that your logs show that there is nothing wrong with the
bonding selection, which, just as in my experiments, seems to properly
follow the link that we would expect to be used for a connection. However,
as you say, in some cases your selected link fails to pass trafic. I wonder
if there may be some MAC address issue.
The bonding interface, configured for active_standby (mode=1) presents one
MAC address (as seen in ifconfig bond0), and this same MAC address is then
programmed into both SLAVE devices, in my case eth1 & eth2. When one link
fails, the other picks up, now sending & receiving on the same MAC address
but from a different physical interface. The connected equipment, a switch,
learns using the MAC address to physical port binding for each MAC address
is sees so that the only directed traffic sent out a switch port is to a
destination that has been seen on that switch port. In my case both eth1 &
eth2 are connected to the network through the same switch , but I'm
wondering if other configurations might cause a complication for the
MAC/port binding algorithm. It just an idea, I don't know enough about how
switches learn to know if its relevant.
We are pretty much at the stage where we need to see what's going on
inside the driver. Bonding selects the proper interface, we need to know if
we ever receive a packet for TX, if we ever send it, if we ever get a
response, if we filter it out, if we send it up the stack to bonding
driver.
If you have ever used systemtap, or would like to try, I can provide you
with some detailed instructions so that we can monitor the behavior of our
drivers and the bonding driver, and see exactly what does./what doesn't
happen. If you have not used it before, I will work up instrumented e1000 &
e1000e driver modules which should help us to resolve the mystery.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=447449&aid=2873479&group_id=42302
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel