Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

Stephan Budach Tue, 17 Jan 2017 23:42:08 -0800

Am 17.01.17 um 23:09 schrieb Dale Ghent:

On Jan 17, 2017, at 2:39 PM, Stephan Budach <stephan.bud...@jvm.de> wrote:


Am 17.01.17 um 17:37 schrieb Dale Ghent:

On Jan 17, 2017, at 11:31 AM, Stephan Budach <stephan.bud...@jvm.de>
  wrote:

Hi Dale,

Am 17.01.17 um 17:22 schrieb Dale Ghent:

On Jan 17, 2017, at 11:12 AM, Stephan Budach <stephan.bud...@jvm.de>

  wrote:

Hi guys,

I am sorry, but I do have to undig this old topic, since I do now have three 
hosts running omniOS 018/020, which show these pesky  issues with flapping 
their ixgbeN links on my Nexus FEXes…

Does anyone know, if there has any change been made to the ixgbe drivers since 
06/2016?

Since June 2016? Yes! A large update to the ixgbe driver happened in August. 
This added X550 support, and also brought the Intel Shared Code it uses from 
its 2012 vintage up to current. The updated driver is available in 014 and 
later.

/dale

do you know of any option to get to know, why three of my boxes are flapping 
their 10GbE ports? It's actually not only when in aggr mode, but on single use 
as well. Last week I presumeably had one of my RSF-1 nodes panic, since it 
couldn't get to it's iSCSI LUNs anymore. The thing ist, that somewhere doen the 
line, the ixgbe driver seems to be fine, to configure one port to 1GbE instead 
of 10GbE, which will stop the flapping, but wich will break the VPC on my Nexus 
nevertheless.

In syslog, this looks like this:

...
Jan 17 14:46:07 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 
1000 Mbps, full duplex
Jan 17 14:46:21 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
10000 Mbps, full duplex
Jan 17 14:46:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:46:26 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
10000 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:22 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
10000 Mbps, full duplex
Jan 17 14:52:22 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:52:32 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
10000 Mbps, full duplex
Jan 17 14:54:50 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:54:55 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
10000 Mbps, full duplex
Jan 17 14:58:12 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down
Jan 17 14:58:16 zfsha02gh79 mac: [ID 435574 kern.info] NOTICE: ixgbe3 link up, 
10000 Mbps, full duplex
Jan 17 14:59:46 zfsha02gh79 mac: [ID 486395 kern.info] NOTICE: ixgbe3 link down

Note on 14:46:07, where the system settles on a 1GbE connection…

Sounds like a cabling issue? Are the runs too long or are you not using CAT6a? 
It flapping at 10Gb and then settling at 1Gb would indicate a cabling issue to 
me. The driver will always try to link at the fastest speed that the local 
controller and the remote peer will negotiate at... it will not proactively 
downgrade the link speed. If that happens, it is because that is what the 
controller managed to negotiate with the remote peer at.

Are you using jumbo frames or anything outside of a normal 1500mtu link?

/dale

The cables are actually specifically purchased cat6 cables. They run about 2m, 
not more. It could be tna cables, but I am running a couple of those and afaik, 
I only get these issues on these three nodes. I can try some other cables, but 
I hoped to be able to get maybe some kind of debug messages from the driver.

The chip provides no reason for a LoS or downgrade of the link. For configuration issues 
it interrupts only on a few things. "LSC" (Link Status Change) interrupts one 
of these things and are what tells the driver to interrogate the chip for its current 
speed, but beyond that, the hardware provides no further details. Any details regarding 
why the PHY had to re-train the link are completely hidden to the driver.

Are these X540 interfaces actually built into the motherboard, or are they separate PCIe 
cards? Also, CAT6 alone might not be enough, and even the magnetics on the older X540 
might not even be able to eek out a 10Gb connection, even at 2m. I would remove all doubt 
of cabling being an issue by replacing them with CAT6a. Beware of cable vendors who sell 
CAT6 cables as "CAT6a". It could also be an issue with the modular jacks on the 
ends.

Since you mentioned "after 6/2016" for the ixgbe driver, have you tried the 
newer one yet? Large portions of it were re-written and re-factored, and many bugs fixed 
including portions that touch the X540 due to the new X550 also being copper and the two 
models needing to share some logic related to that.

/dale

Thanks for clarifying that. I just checked the cables and they classify as Cat6a and they are from a respectable german vendor, not that this would be any guarantee, but at least they're no bulkware from china. ;)

The X540s are either onboard on some Supermicro X10 boards, but also on a genuine Intel PCI adaptor. I will check some other cables, maybe the ones I got were somewhat faulty. However, this leaves only a few options to the user, finding out, what is actually wrong with the connection, isn't it?

Regarding the release of omniOS, I will update my RSF-1 node to the latest r18, the other two new nodes are actually on r20 and thus should already have the new driver installed.


…any suggestion on some good cables? ;)

Thanks,
Stephan

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

Reply via email to