From: Jeff Campbell [mailto:[email protected]]
Sent: Tuesday, September 07, 2010 5:06 PM
To: Wyborny, Carolyn
Cc: Ronciak, John; [email protected]
Subject: Re: [E1000-devel] 82574L - Multicast transmit failing, causes
performance issues
On Tue, Sep 7, 2010 at 4:14 PM, Jeff Campbell
<[email protected]<mailto:[email protected]>> wrote:
John, Caroyln,
After an entire day of adventures on the network, including some unusual
discoveries in VLAN configuration, we appear to have closed in on things.
We are now replicating our success and will report back shortly. In the mean
time, no action is required until we can confirm our results. The variables
turned out to be multifaceted.
I will keep you updated.
Great news. The issue has been resolved and multicast is functioning just fine
on the 82574L with the latest 1.2.10-NAPI driver against the Ubuntu 10.04
2.6.32-24-generic kernel.
There may be an issue around flow control and XOFF "back pressure" although it
is unclear whether the problem may be related to the switch or the NIC or the
interaction of the two. See below for more detail.
After a long day of packet sniffing and double checking the parameters of each
of the test programs we discovered that one of the two output multicast groups,
originating at the Supermicro/82574L board, was inadvertently being sent out
the admin interface instead of the video interface. This caused a number of
issues and appears to be a proximate cause of the the performance issues during
ssh console access. (A little surprising given the relatively low sustained
bit-rate of 5 Mbps).
Due to a small misconfiguration in the switch, one of the desktop monitoring
station ports was also joined to a VLAN trunk port that contained multiple
VLANs (including the video and admin LANs). What we discovered was:
a) The second multicast stream was actually on the admin LAN (and due to the
non-IGMP awareness of the switch was flooding to all ports)
b) This caused some unidentified devices on the admin LAN (which has multiple
10/100 only devices) to issue flow control XOFFs.
c) Either due to a feedback loop, a bug in the switch, or possibly a bug with
the 82574L driver, the amount of flow control traffic on the network escalated
until the network became unusable
By removing the admin network ports from the VLAN trunk (the error which caused
the admin network to see broadcast traffic in all (3) VLANs), and explicitly
directing the second multicast stream from the 82574L based machine to the
video output network, we were able to achieve stability.
During the course of all this testing, port speeds were dialed down to 100 Mbps
and devices were isolated. Everything has now been unwound back to the
condition of the network when the original test was done and "problem" was
reported. Result: No Problem.
Flow control has been restored to all ports, and all ports are not in
auto-negotiation mode once again.
After all of the above was completed, we repeated the same test on the Atom
based Supermicro board. This yielded a small number of continuity errors that
should not be there, however that is likely due to the fact that the machine is
still on the older (1.0.2) driver. We will update to the same driver version
as the Xeon based system and re-test. We will only report back if the test
still exhibits problems, otherwise it is safe to assume the newer driver
achieved the same positive results.
What remains outstanding, although not something we can currently investigate
further, is what the exact source of the apparent broadcast storm of XOFF
messages was and whether or not this is related to the 82574L. If we encounter
it again we will report back, otherwise it will remain a possible data point
for future investigations in the event that flow control appears to be an
issue. The switch in question is also slated to be upgraded with an IGMP aware
unit.
Thank you to John and Caroyln for the offers of assistance and I'm sure we'll
be back with more questions as we evolve our solution around the 82574L.
-Jeff
Glad to hear things are now working for you. Regarding the flow control, you
need to very careful about using flow control when you are mixing 10/100 and 1
gig links. The 10/100 flow control delays (xoff's) are _very_ large compared
to the 1 gig delays. I don't think switches have any idea as to how to deal
with this kind of configuration. That is probably what you are seeing. Try
not to mix the links or control the use of flow control very carefully.
Cheers,
John
------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:
Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired