From: Jeff Campbell [mailto:[email protected]]
Sent: Tuesday, September 07, 2010 5:06 PM
To: Wyborny, Carolyn
Cc: Ronciak, John; [email protected]
Subject: Re: [E1000-devel] 82574L - Multicast transmit failing, causes 
performance issues


On Tue, Sep 7, 2010 at 4:14 PM, Jeff Campbell 
<[email protected]<mailto:[email protected]>> wrote:
John, Caroyln,

After an entire day of adventures on the network, including some unusual 
discoveries in VLAN configuration, we appear to have closed in on things.

We are now replicating our success and will report back shortly.  In the mean 
time, no action is required until we can confirm our results.  The variables 
turned out to be multifaceted.

I will keep you updated.

Great news.  The issue has been resolved and multicast is functioning just fine 
on the 82574L with the latest 1.2.10-NAPI driver against the Ubuntu 10.04 
2.6.32-24-generic kernel.

There may be an issue around flow control and XOFF "back pressure" although it 
is unclear whether the problem may be related to the switch or the NIC or the 
interaction of the two.  See below for more detail.

After a long day of packet sniffing and double checking the parameters of each 
of the test programs we discovered that one of the two output multicast groups, 
originating at the Supermicro/82574L board, was inadvertently being sent out 
the admin interface instead of the video interface.  This caused a number of 
issues and appears to be a proximate cause of the the performance issues during 
ssh console access.  (A little surprising given the relatively low sustained 
bit-rate of 5 Mbps).

Due to a small misconfiguration in the switch, one of the desktop monitoring 
station ports was also joined to a VLAN trunk port that contained multiple 
VLANs (including the video and admin LANs).  What we discovered was:

a) The second multicast stream was actually on the admin LAN (and due to the 
non-IGMP awareness of the switch was flooding to all ports)
b) This caused some unidentified devices on the admin LAN (which has multiple 
10/100 only devices) to issue flow control XOFFs.
c) Either due to a feedback loop, a bug in the switch, or possibly a bug with 
the 82574L driver, the amount of flow control traffic on the network escalated 
until the network became unusable

By removing the admin network ports from the VLAN trunk (the error which caused 
the admin network to see broadcast traffic in all (3) VLANs), and explicitly 
directing the second multicast stream from the 82574L based machine to the 
video output network, we were able to achieve stability.

During the course of all this testing, port speeds were dialed down to 100 Mbps 
and devices were isolated.  Everything has now been unwound back to the 
condition of the network when the original test was done and "problem" was 
reported.  Result:  No Problem.

Flow control has been restored to all ports, and all ports are not in 
auto-negotiation mode once again.

After all of the above was completed, we repeated the same test on the Atom 
based Supermicro board.  This yielded a small number of continuity errors that 
should not be there, however that is likely due to the fact that the machine is 
still on the older (1.0.2) driver.  We will update to the same driver version 
as the Xeon based system and re-test.  We will only report back if the test 
still exhibits problems, otherwise it is safe to assume the newer driver 
achieved the same positive results.

What remains outstanding, although not something we can currently investigate 
further, is what the exact source of the apparent broadcast storm of XOFF 
messages was and whether or not this is related to the 82574L.  If we encounter 
it again we will report back, otherwise it will remain a possible data point 
for future investigations in the event that flow control appears to be an 
issue.  The switch in question is also slated to be upgraded with an IGMP aware 
unit.

Thank you to John and Caroyln for the offers of assistance and I'm sure we'll 
be back with more questions as we evolve our solution around the 82574L.

-Jeff

Glad to hear things are now working for you.  Regarding the flow control, you 
need to very careful about using flow control when you are mixing 10/100 and 1 
gig links.  The 10/100 flow control delays (xoff's) are _very_ large compared 
to the 1 gig delays.  I don't think switches have any idea as to how to deal 
with this kind of configuration.  That is probably what you are seeing.  Try 
not to mix the links or control the use of flow control very carefully.

Cheers,
John


------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to