Re: [f-nsp] MLX throughput issues

Jethro R Binks Tue, 17 Feb 2015 01:03:13 -0800

On Mon, 16 Feb 2015, Brad Fleming wrote:

> We’ve seen it since installing the high-capacity switch fabrics into our 
> XMR4000 chassis roughly 4 years ago. We saw it through IronWare 5.4.00d. 
> I’m not sure what software we were using when they were first installed; 
> probably whatever would have been stable/popular around December 2010.
> 
> Command is simply “power-off snm [1-3]” then “power-on snm [1-3]”.


Ah I see it ... I was looking for "SFM"s not "SNM"s!

I also echo the poster's questions about how you notice the corruption.  
I have a suspicion I may be seeing similar things; particularly so with 
UDP-based transactions like NTP and RADIUS which could pass through such a 
chassis.  But I also suffer on CPU spikes with mcast traffic on that 
chassis too which has always been an issue for me.

Thanks.

Jethro.



> 
> Note that the power-on process causes your management session to hang 
> for a few seconds. The device isn’t broken and packets aren’t getting 
> dropped; it’s just going through checks and echoing back status.
> 
> -brad
> 
> 
> > On Feb 16, 2015, at 7:07 AM, Jethro R Binks <jethro.bi...@strath.ac.uk> 
> > wrote:
> > 
> > On Fri, 13 Feb 2015, Brad Fleming wrote:
> > 
> >> Over the years we’ve seen odd issues where one of the 
> >> switch-fabric-links will “wigout” and some of the data moving between 
> >> cards will get corrupted. When this happens we power cycle each switch 
> >> fab one at a time using this process:
> >> 
> >> 1) Shutdown SFM #3
> >> 2) Wait 1 minute
> >> 3) Power SFM #3 on again
> >> 4) Verify all SFM links are up to SFM#3
> >> 5) Wait 1 minute
> >> 6) Perform steps 1-5 for SFM #2
> >> 7) Perform steps 1-5 for SFM #3
> >> 
> >> Not sure you’re seeing the same issue that we see but the “SFM Dance” 
> >> (as we call it) is a once-every-four-months thing somewhere across our 
> >> 16 XMR4000 boxes. It can be done with little to no impact if you are 
> >> patient verify status before moving to the next SFM.
> > 
> > That's all interesting.  What code versions is this?  Also, how do you 
> > shutdown the SFMs?  I don't recall seeing documentation for that.
> > 
> > Jethro.
> > 
> > 
> >> 
> >>> On Feb 13, 2015, at 11:41 AM, net...@gmail.com wrote:
> >>> 
> >>> We have three switch fabrics installed, all are under 1% utilized.
> >>> 
> >>> 
> >>> From: Jeroen Wunnink | Hibernia Networks 
> >>> [mailto:jeroen.wunn...@atrato.com <mailto:jeroen.wunn...@atrato.com>] 
> >>> Sent: Friday, February 13, 2015 12:27 PM
> >>> To: net...@gmail.com <mailto:net...@gmail.com>; 'Jeroen Wunnink | 
> >>> Hibernia Networks'
> >>> Subject: Re: [f-nsp] MLX throughput issues
> >>> 
> >>> How many switchfabrics do you have in that MLX and how high is the 
> >>> utilization on them
> >>> 
> >>> On 13/02/15 18:12, net...@gmail.com <mailto:net...@gmail.com> wrote:
> >>>> We also tested with a spare Quanta LB4M we have and are seeing about the 
> >>>> same speeds as we are seeing with the FLS648 (around 20MB/s or 160Mbps).
> >>>> 
> >>>> I also reduced the number of routes we are accepting down to about 189K 
> >>>> and that did not make a difference.
> >>>> 
> >>>> 
> >>>> From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net 
> >>>> <mailto:foundry-nsp-boun...@puck.nether.net>] On Behalf Of Jeroen 
> >>>> Wunnink | Hibernia Networks
> >>>> Sent: Friday, February 13, 2015 3:35 AM
> >>>> To: foundry-nsp@puck.nether.net <mailto:foundry-nsp@puck.nether.net>
> >>>> Subject: Re: [f-nsp] MLX throughput issues
> >>>> 
> >>>> The FLS switches do something weird with packets. I've noticed they 
> >>>> somehow interfere with changing the MSS window size dynamically, 
> >>>> resulting in destinations further away having very poor speed results 
> >>>> compared to destinations close by. 
> >>>> 
> >>>> We got rid of those a while ago.
> >>>> 
> >>>> 
> >>>> On 12/02/15 17:37, net...@gmail.com <mailto:net...@gmail.com> wrote:
> >>>>> We are having a strange issue on our MLX running code 5.6.00c.  We are 
> >>>>> encountering some throughput issues that seem to be randomly impacting 
> >>>>> specific networks.
> >>>>> 
> >>>>> We use the MLX to handle both external BGP and internal VLAN routing.  
> >>>>> Each FLS648 is used for Layer 2 VLANs only.
> >>>>> 
> >>>>> From a server connected by 1 Gbps uplink to a Foundry FLS648 switch, 
> >>>>> which is then connected to the MLX on a 10 Gbps port, running a speed 
> >>>>> test to an external network is getting 20MB/s.
> >>>>> 
> >>>>> Connecting the same server directly to the MLX is getting 70MB/s.
> >>>>> 
> >>>>> Connecting the same server to one of my customer's Juniper EX3200 
> >>>>> (which BGP peers with the MLX) also gets 70MB/s.
> >>>>> 
> >>>>> Testing to another external network, all three scenarios get 110MB/s.
> >>>>> 
> >>>>> The path to both test network locations goes through the same IP 
> >>>>> transit provider.
> >>>>> 
> >>>>> We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the 
> >>>>> Foundry FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly 
> >>>>> connecting the server.  A separate NI-MLX-10Gx4 connects to our 
> >>>>> upstream BGP providers.  Customer’s Juniper EX3200 connects to the same 
> >>>>> NI-MLX-10Gx4 as the FLS648.  We take default routes plus full tables 
> >>>>> from three providers by BGP, but filter out most of the routes.
> >>>>> 
> >>>>> The fiber and optics on everything look fine.  CPU usage is less than 
> >>>>> 10% on the MLX and all line cards and CPU usage at 1% on the FLS648.  
> >>>>> ARP table on the MLX is about 12K, and BGP table is about 308K routes.
> >>>>> 
> >>>>> Any assistance would be appreciated.  I suspect there is a setting that 
> >>>>> we’re missing on the MLX that is causing this issue.
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> _______________________________________________
> >>>>> foundry-nsp mailing list
> >>>>> foundry-nsp@puck.nether.net <mailto:foundry-nsp@puck.nether.net>
> >>>>> http://puck.nether.net/mailman/listinfo/foundry-nsp 
> >>>>> <http://puck.nether.net/mailman/listinfo/foundry-nsp>
> >>>> 
> >>>> 
> >>>> 
> >>>> -- 
> >>>> 
> >>>> Jeroen Wunnink
> >>>> IP NOC Manager - Hibernia Networks
> >>>> Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300 
> >>>> Netherlands +31.208.200.622 | 24/7 IP NOC Phone: +31.20.82.00.623
> >>>> jeroen.wunn...@hibernianetworks.com 
> >>>> <mailto:jeroen.wunn...@hibernianetworks.com>
> >>>> www.hibernianetworks.com <http://www.hibernianetworks.com/>
> >>> 
> >>> 
> >>> -- 
> >>> 
> >>> Jeroen Wunnink
> >>> IP NOC Manager - Hibernia Networks
> >>> Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300 
> >>> Netherlands +31.208.200.622 | 24/7 IP NOC Phone: +31.20.82.00.623
> >>> jeroen.wunn...@hibernianetworks.com 
> >>> <mailto:jeroen.wunn...@hibernianetworks.com>
> >>> www.hibernianetworks.com 
> >>> <http://www.hibernianetworks.com/>_______________________________________________
> >>> foundry-nsp mailing list
> >>> foundry-nsp@puck.nether.net <mailto:foundry-nsp@puck.nether.net>
> >>> http://puck.nether.net/mailman/listinfo/foundry-nsp 
> >>> <http://puck.nether.net/mailman/listinfo/foundry-nsp>
> >> 
> > 
> > .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
> > Jethro R Binks, Network Manager,
> > Information Services Directorate, University Of Strathclyde, Glasgow, UK
> > 
> > The University of Strathclyde is a charitable body, registered in
> > Scotland, number SC015263.
> 
> 

.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
Jethro R Binks, Network Manager,
Information Services Directorate, University Of Strathclyde, Glasgow, UK

The University of Strathclyde is a charitable body, registered in
Scotland, number SC015263.

_______________________________________________
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

Re: [f-nsp] MLX throughput issues

Reply via email to