Re: [f-nsp] MLX throughput issues

2015-02-20 Thread Wilbur Smith
Hello All,

Sorry to reply late, but it seems like you were hitting the buffer limit for a 
port domain (group of ports). I don’t have an FLS in front of me (flying ATM) 
so I can’t confirm, but I think we’re breaking up the buffer space into 
reserved segments for each port group. The reasoning behind this is that Is 
keeps “slow drain” devices on a single interface from using up all available 
buffer space for the switch. The down side is that if a port exhausts its 
allotted buffers, it can cause slow downs.

Over the years we’ve gone back in forth over whether its better to ship with 
shared buffers enabled; I think it would generate the same amount of TAC 
requests no matter what we do. Although the FLS isn’t as beefy as the FCX or 
ICX, it should still have some nobs you can turn to  increase performance. This 
should be in the config guide.

I’d try to narrow down what device or devices is causing buffer pressure on the 
switch and consider enabling ethernet pause-frames (flow control) on the switch 
and neighboring devices. There’s also different QOS setting that can switch 
from stick queues to weighted round-robin (and other types) to help make better 
use of the buffers on the uplink ports.

Sorry you’re running into this. The FLS is a very good campus access switch 
platform (good latency and minimal oversubscription, for a good cost), but my 
view is that it’s not the best switch to front-end server connections or heavy 
I/O. Others may disagree with me on this though.

Wilbur

From: net...@gmail.commailto:net...@gmail.com
Date: Friday, February 13, 2015 at 4:13 PM
To: Brad Fleming
Cc: 'Jeroen Wunnink | Hibernia Networks', 
foundry-nsp@puck.nether.netmailto:foundry-nsp@puck.nether.net
Subject: Re: [f-nsp] MLX throughput issues

We already tried a full system reboot last night and it didn’t seem to help.  
I’ll definitely keep your switch fabric reboot procedure in mind in case we run 
into that in the future.

I think we may have figured out at least a short-term solution.  On the FLS648, 
we ran the command “buffer-sharing-full” and immediately we were able to get 
better speeds.  It seems as though the FLS648’s buffers may have been causing 
the issue.  We’ll continue to monitor over the next few days and see if this 
actually solves the issue.

Thanks everyone for your feedback thus far.



From: Brad Fleming [mailto:bdfle...@gmail.com]
Sent: Friday, February 13, 2015 4:24 PM
To: net...@gmail.commailto:net...@gmail.com
Cc: Jeroen Wunnink | Hibernia Networks; 
foundry-nsp@puck.nether.netmailto:foundry-nsp@puck.nether.net
Subject: Re: [f-nsp] MLX throughput issues

Over the years we’ve seen odd issues where one of the switch-fabric-links will 
“wigout” and some of the data moving between cards will get corrupted. When 
this happens we power cycle each switch fab one at a time using this process:
1) Shutdown SFM #3
2) Wait 1 minute
3) Power SFM #3 on again
4) Verify all SFM links are up to SFM#3
5) Wait 1 minute
6) Perform steps 1-5 for SFM #2
7) Perform steps 1-5 for SFM #3

Not sure you’re seeing the same issue that we see but the “SFM Dance” (as we 
call it) is a once-every-four-months thing somewhere across our 16 XMR4000 
boxes. It can be done with little to no impact if you are patient verify status 
before moving to the next SFM.


On Feb 13, 2015, at 11:41 AM, net...@gmail.commailto:net...@gmail.com wrote:

We have three switch fabrics installed, all are under 1% utilized.


From: Jeroen Wunnink | Hibernia Networks [mailto:jeroen.wunn...@atrato.com]
Sent: Friday, February 13, 2015 12:27 PM
To: net...@gmail.commailto:net...@gmail.com; 'Jeroen Wunnink | Hibernia 
Networks'
Subject: Re: [f-nsp] MLX throughput issues

How many switchfabrics do you have in that MLX and how high is the utilization 
on them

On 13/02/15 18:12, net...@gmail.commailto:net...@gmail.com wrote:
We also tested with a spare Quanta LB4M we have and are seeing about the same 
speeds as we are seeing with the FLS648 (around 20MB/s or 160Mbps).

I also reduced the number of routes we are accepting down to about 189K and 
that did not make a difference.


From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of 
Jeroen Wunnink | Hibernia Networks
Sent: Friday, February 13, 2015 3:35 AM
To: foundry-nsp@puck.nether.netmailto:foundry-nsp@puck.nether.net
Subject: Re: [f-nsp] MLX throughput issues

The FLS switches do something weird with packets. I've noticed they somehow 
interfere with changing the MSS window size dynamically, resulting in 
destinations further away having very poor speed results compared to 
destinations close by.

We got rid of those a while ago.


On 12/02/15 17:37, net...@gmail.commailto:net...@gmail.com wrote:
We are having a strange issue on our MLX running code 5.6.00c.  We are 
encountering some throughput issues that seem to be randomly impacting specific 
networks.

We use the MLX to handle both external BGP and internal VLAN routing.  Each 
FLS648 is used for Layer 2 VLANs

Re: [f-nsp] MLX throughput issues

2015-02-18 Thread Wouter de Jong
Just a general 'watch out' with regards to 5.5e I'd like to share, as we've had 
very bad results with 5.5 (including 5.5e)
with regards to IPv6 in combination with IS-IS
(though TAC also mentioned it was on OSPF as well, but don't see that in the 
release notes)


DEFECT000500944


Router-A has for example a path towards eg. the IPv6 loopback address of 
Router-B via eth1/4
Now eth1/4 on Router-A goes down.

IS-IS picks for example eth 3/4 as the new best path towards the loopback.
Now, a 'show ipv6 route loopback-of-Router-B' correctly shows the new 
interface (eth 3/4)...

Yet the IPv6 cache keeps a stale entry towards eth 1/4... and traffic from 
Router-A towards
Router-B's loopback is now blackholed as it still tries to send it out eth 
1/4 (even when it's down)

This affected in our case for example IPv6 iBGP sessions so far for 
redundant links, etc.

Physical or ve does not matter.
Also, if I recall correctly, nasty stuff also happened when you simply made a 
metric change of the IS-IS path.
(Eg. when eth 3/4 is towards Router-C, and Router-C has a best path towards 
Router-B's loopback via the link to Router-A)

I believe it was only problematic for 'local' traffic from Router-A, and not 
for transit traffic - but not 100% sure anymore.

Fixed in 5.6.something, not sure if they ever fixed it in 5.5

Simplest workaround once you are affected is executing the 'hidden' command 
(DEFECT000503937 ) : clear ipv6 cache


Best regards,

Wouter


From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of 
Jeroen Wunnink | Hibernia Networks
Sent: Wednesday, February 18, 2015 17:32
To: Brad Fleming; Frank Bulk
Cc: foundry-nsp@puck.nether.net
Subject: Re: [f-nsp] MLX throughput issues

Try physically pulling the SFM's one by one rather then just powercycling them. 
Yes there is a difference there :-)

Also, 5400d is fairly buggy, there were some major issues with CRC checksums in 
hSFM's and on 100G cards. We have fairly good results with 5500e



On 18/02/15 16:50, Brad Fleming wrote:
TAC replaced hSFMs and line cards the first couple times but we've seen this 
issue at least once on every node in the network. The ones where we replaced 
every module (SFM, mgmt, port cards, even PSUs) have still had at least one 
event. So I'm not even sure what hardware we'd replace at this point. That lead 
us to thinking a config problem since each box uses the same template but after 
a lengthy audit with TAC nobody could find anything. It happens infrequently 
enough that we grew to just live with it.



On Feb 18, 2015, at 12:45 AM, Frank Bulk 
frnk...@iname.commailto:frnk...@iname.com wrote:

So don't errors like this suggest replacing the hardware?

Frank

From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of 
Brad Fleming
Sent: Tuesday, February 17, 2015 3:10 PM
To: Josh Galvez
Cc: foundry-nsp@puck.nether.netmailto:foundry-nsp@puck.nether.net
Subject: Re: [f-nsp] MLX throughput issues

The common symptoms for us are alarms of TM errors / resets. We've been told on 
multiple TAC cases that logs indicating transmit TM errors are likely caused by 
problems in one of the SFM links / lanes. We've been told that resetting the 
SFMs one at a time will clear the issue.

Symptoms during the issue is that 1/3rd of the traffic moving from one TM to 
another TM will simply get dropped. So we see TCP globally start to throttle 
like crazy and if enough errors count up the TM will simply reset. After the TM 
reset is seems a 50/50 chance the box will remain stable or go back to dropping 
packets within ~20mins. So when we see a TM reset we simply do the SFM Dance no 
matter what.


On Feb 16, 2015, at 10:08 PM, Josh Galvez 
j...@zevlag.commailto:j...@zevlag.com wrote:

Why kind of wigout? And how do you diagnose the corruption?  I'm intrigued.

On Mon, Feb 16, 2015 at 8:43 AM, Brad Fleming 
bdfle...@gmail.commailto:bdfle...@gmail.com wrote:
We've seen it since installing the high-capacity switch fabrics into our 
XMR4000 chassis roughly 4 years ago. We saw it through IronWare 5.4.00d. I'm 
not sure what software we were using when they were first installed; probably 
whatever would have been stable/popular around December 2010.

Command is simply power-off snm [1-3] then power-on snm [1-3].

Note that the power-on process causes your management session to hang for a few 
seconds. The device isn't broken and packets aren't getting dropped; it's just 
going through checks and echoing back status.

-brad


 On Feb 16, 2015, at 7:07 AM, Jethro R Binks 
 jethro.bi...@strath.ac.ukmailto:jethro.bi...@strath.ac.uk wrote:

 On Fri, 13 Feb 2015, Brad Fleming wrote:

 Over the years we've seen odd issues where one of the
 switch-fabric-links will wigout and some of the data moving between
 cards will get corrupted. When this happens we power cycle each switch
 fab one at a time using this process:

 1) Shutdown SFM #3
 2) Wait 1 minute
 3) Power SFM #3 on again
 4) Verify all SFM links are up

Re: [f-nsp] MLX throughput issues

2015-02-18 Thread Brad Fleming
TAC replaced hSFMs and line cards the first couple times but we’ve seen this 
issue at least once on every node in the network. The ones where we replaced 
every module (SFM, mgmt, port cards, even PSUs) have still had at least one 
event. So I’m not even sure what hardware we’d replace at this point. That lead 
us to thinking a config problem since each box uses the same template but after 
a lengthy audit with TAC nobody could find anything. It happens infrequently 
enough that we grew to just live with it. 



 On Feb 18, 2015, at 12:45 AM, Frank Bulk frnk...@iname.com wrote:
 
 So don’t errors like this suggest replacing the hardware?
  
 Frank
  
 From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of 
 Brad Fleming
 Sent: Tuesday, February 17, 2015 3:10 PM
 To: Josh Galvez
 Cc: foundry-nsp@puck.nether.net
 Subject: Re: [f-nsp] MLX throughput issues
  
 The common symptoms for us are alarms of TM errors / resets. We’ve been told 
 on multiple TAC cases that logs indicating transmit TM errors are likely 
 caused by problems in one of the SFM links / lanes. We’ve been told that 
 resetting the SFMs one at a time will clear the issue.
  
 Symptoms during the issue is that 1/3rd of the traffic moving from one TM to 
 another TM will simply get dropped. So we see TCP globally start to throttle 
 like crazy and if enough errors count up the TM will simply reset. After the 
 TM reset is seems a 50/50 chance the box will remain stable or go back to 
 dropping packets within ~20mins. So when we see a TM reset we simply do the 
 SFM Dance no matter what.
  
  
 On Feb 16, 2015, at 10:08 PM, Josh Galvez j...@zevlag.com 
 mailto:j...@zevlag.com wrote:
  
 Why kind of wigout? And how do you diagnose the corruption?  I'm intrigued.
  
 On Mon, Feb 16, 2015 at 8:43 AM, Brad Fleming bdfle...@gmail.com 
 mailto:bdfle...@gmail.com wrote:
 We’ve seen it since installing the high-capacity switch fabrics into our 
 XMR4000 chassis roughly 4 years ago. We saw it through IronWare 5.4.00d. 
 I’m not sure what software we were using when they were first installed; 
 probably whatever would have been stable/popular around December 2010.
 
 Command is simply “power-off snm [1-3]” then “power-on snm [1-3]”.
 
 Note that the power-on process causes your management session to hang for a 
 few seconds. The device isn’t broken and packets aren’t getting dropped; 
 it’s just going through checks and echoing back status.
 
 -brad
 
 
  On Feb 16, 2015, at 7:07 AM, Jethro R Binks jethro.bi...@strath.ac.uk 
  mailto:jethro.bi...@strath.ac.uk wrote:
 
  On Fri, 13 Feb 2015, Brad Fleming wrote:
 
  Over the years we’ve seen odd issues where one of the
  switch-fabric-links will “wigout” and some of the data moving between
  cards will get corrupted. When this happens we power cycle each switch
  fab one at a time using this process:
 
  1) Shutdown SFM #3
  2) Wait 1 minute
  3) Power SFM #3 on again
  4) Verify all SFM links are up to SFM#3
  5) Wait 1 minute
  6) Perform steps 1-5 for SFM #2
  7) Perform steps 1-5 for SFM #3
 
  Not sure you’re seeing the same issue that we see but the “SFM Dance”
  (as we call it) is a once-every-four-months thing somewhere across our
  16 XMR4000 boxes. It can be done with little to no impact if you are
  patient verify status before moving to the next SFM.
 
  That's all interesting.  What code versions is this?  Also, how do you
  shutdown the SFMs?  I don't recall seeing documentation for that.
 
  Jethro.
 
 
 
  On Feb 13, 2015, at 11:41 AM, net...@gmail.com 
  mailto:net...@gmail.com wrote:
 
  We have three switch fabrics installed, all are under 1% utilized.
 
 
  From: Jeroen Wunnink | Hibernia Networks 
  [mailto:jeroen.wunn...@atrato.com mailto:jeroen.wunn...@atrato.com 
  mailto:jeroen.wunn...@atrato.com mailto:jeroen.wunn...@atrato.com]
  Sent: Friday, February 13, 2015 12:27 PM
  To: net...@gmail.com mailto:net...@gmail.com mailto:net...@gmail.com 
  mailto:net...@gmail.com; 'Jeroen Wunnink | Hibernia Networks'
  Subject: Re: [f-nsp] MLX throughput issues
 
  How many switchfabrics do you have in that MLX and how high is the 
  utilization on them
 
  On 13/02/15 18:12, net...@gmail.com mailto:net...@gmail.com 
  mailto:net...@gmail.com mailto:net...@gmail.com wrote:
  We also tested with a spare Quanta LB4M we have and are seeing about 
  the same speeds as we are seeing with the FLS648 (around 20MB/s or 
  160Mbps).
 
  I also reduced the number of routes we are accepting down to about 
  189K and that did not make a difference.
 
 
  From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net 
  mailto:foundry-nsp-boun...@puck.nether.net 
  mailto:foundry-nsp-boun...@puck.nether.net 
  mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of Jeroen 
  Wunnink | Hibernia Networks
  Sent: Friday, February 13, 2015 3:35 AM
  To: foundry-nsp@puck.nether.net mailto:foundry-nsp@puck.nether.net 
  mailto:foundry-nsp@puck.nether.net 
  mailto:foundry-nsp

Re: [f-nsp] MLX throughput issues

2015-02-18 Thread Jeroen Wunnink | Hibernia Networks
Try physically pulling the SFM's one by one rather then just 
powercycling them. Yes there is a difference there :-)


Also, 5400d is fairly buggy, there were some major issues with CRC 
checksums in hSFM's and on 100G cards. We have fairly good results with 
5500e




On 18/02/15 16:50, Brad Fleming wrote:
TAC replaced hSFMs and line cards the first couple times but we’ve 
seen this issue at least once on every node in the network. The ones 
where we replaced every module (SFM, mgmt, port cards, even PSUs) have 
still had at least one event. So I’m not even sure what hardware we’d 
replace at this point. That lead us to thinking a config problem since 
each box uses the same template but after a lengthy audit with TAC 
nobody could find anything. It happens infrequently enough that we 
grew to just live with it.




On Feb 18, 2015, at 12:45 AM, Frank Bulk frnk...@iname.com 
mailto:frnk...@iname.com wrote:


So don’t errors like this suggest replacing the hardware?
Frank
*From:*foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net]*On 
Behalf Of*Brad Fleming

*Sent:*Tuesday, February 17, 2015 3:10 PM
*To:*Josh Galvez
*Cc:*foundry-nsp@puck.nether.net mailto:foundry-nsp@puck.nether.net
*Subject:*Re: [f-nsp] MLX throughput issues
The common symptoms for us are alarms of TM errors / resets. We’ve 
been told on multiple TAC cases that logs indicating transmit TM 
errors are likely caused by problems in one of the SFM links / lanes. 
We’ve been told that resetting the SFMs one at a time will clear the 
issue.
Symptoms during the issue is that 1/3rd of the traffic moving from 
one TM to another TM will simply get dropped. So we see TCP globally 
start to throttle like crazy and if enough errors count up the TM 
will simply reset. After the TM reset is seems a 50/50 chance the box 
will remain stable or go back to dropping packets within ~20mins. So 
when we see a TM reset we simply do the SFM Dance no matter what.
On Feb 16, 2015, at 10:08 PM, Josh Galvez j...@zevlag.com 
mailto:j...@zevlag.com wrote:
Why kind of wigout? And how do you diagnose the corruption?  I'm 
intrigued.
On Mon, Feb 16, 2015 at 8:43 AM, Brad Fleming bdfle...@gmail.com 
mailto:bdfle...@gmail.com wrote:
We’ve seen it since installing the high-capacity switch fabrics 
into our XMR4000 chassis roughly 4 years ago. We saw it through 
IronWare 5.4.00d. I’m not sure what software we were using when 
they were first installed; probably whatever would have been 
stable/popular around December 2010.


Command is simply “power-off snm [1-3]” then “power-on snm [1-3]”.

Note that the power-on process causes your management session to 
hang for a few seconds. The device isn’t broken and packets aren’t 
getting dropped; it’s just going through checks and echoing back 
status.


-brad


 On Feb 16, 2015, at 7:07 AM, Jethro R Binks 
jethro.bi...@strath.ac.uk mailto:jethro.bi...@strath.ac.uk wrote:


 On Fri, 13 Feb 2015, Brad Fleming wrote:

 Over the years we’ve seen odd issues where one of the
 switch-fabric-links will “wigout” and some of the data moving 
between
 cards will get corrupted. When this happens we power cycle each 
switch

 fab one at a time using this process:

 1) Shutdown SFM #3
 2) Wait 1 minute
 3) Power SFM #3 on again
 4) Verify all SFM links are up to SFM#3
 5) Wait 1 minute
 6) Perform steps 1-5 for SFM #2
 7) Perform steps 1-5 for SFM #3

 Not sure you’re seeing the same issue that we see but the “SFM 
Dance”
 (as we call it) is a once-every-four-months thing somewhere 
across our

 16 XMR4000 boxes. It can be done with little to no impact if you are
 patient verify status before moving to the next SFM.

 That's all interesting. What code versions is this? Also, how do you
 shutdown the SFMs?  I don't recall seeing documentation for that.

 Jethro.



 On Feb 13, 2015, at 11:41 AM,net...@gmail.com 
mailto:net...@gmail.comwrote:


 We have three switch fabrics installed, all are under 1% utilized.


 From: Jeroen Wunnink | Hibernia Networks 
[mailto:jeroen.wunn...@atrato.com 
mailto:jeroen.wunn...@atrato.commailto:jeroen.wunn...@atrato.com 
mailto:jeroen.wunn...@atrato.com]

 Sent: Friday, February 13, 2015 12:27 PM
 To:net...@gmail.com 
mailto:net...@gmail.commailto:net...@gmail.com 
mailto:net...@gmail.com; 'Jeroen Wunnink | Hibernia Networks'

 Subject: Re: [f-nsp] MLX throughput issues

 How many switchfabrics do you have in that MLX and how high is 
the utilization on them


 On 13/02/15 18:12,net...@gmail.com 
mailto:net...@gmail.commailto:net...@gmail.com 
mailto:net...@gmail.com wrote:
 We also tested with a spare Quanta LB4M we have and are seeing 
about the same speeds as we are seeing with the FLS648 (around 
20MB/s or 160Mbps).


 I also reduced the number of routes we are accepting down to 
about 189K and that did not make a difference.



 From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net 
mailto:foundry-nsp-boun...@puck.nether.netmailto:foundry-nsp-boun...@puck.nether.net 
mailto:foundry-nsp-boun

Re: [f-nsp] MLX throughput issues

2015-02-17 Thread Frank Bulk
So don’t errors like this suggest replacing the hardware?

 

Frank

 

From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of 
Brad Fleming
Sent: Tuesday, February 17, 2015 3:10 PM
To: Josh Galvez
Cc: foundry-nsp@puck.nether.net
Subject: Re: [f-nsp] MLX throughput issues

 

The common symptoms for us are alarms of TM errors / resets. We’ve been told on 
multiple TAC cases that logs indicating transmit TM errors are likely caused by 
problems in one of the SFM links / lanes. We’ve been told that resetting the 
SFMs one at a time will clear the issue.

 

Symptoms during the issue is that 1/3rd of the traffic moving from one TM to 
another TM will simply get dropped. So we see TCP globally start to throttle 
like crazy and if enough errors count up the TM will simply reset. After the TM 
reset is seems a 50/50 chance the box will remain stable or go back to dropping 
packets within ~20mins. So when we see a TM reset we simply do the SFM Dance no 
matter what.

 

 

On Feb 16, 2015, at 10:08 PM, Josh Galvez j...@zevlag.com 
mailto:j...@zevlag.com  wrote:

 

Why kind of wigout? And how do you diagnose the corruption?  I'm intrigued.

 

On Mon, Feb 16, 2015 at 8:43 AM, Brad Fleming bdfle...@gmail.com 
mailto:bdfle...@gmail.com  wrote:

We’ve seen it since installing the high-capacity switch fabrics into our 
XMR4000 chassis roughly 4 years ago. We saw it through IronWare 5.4.00d. I’m 
not sure what software we were using when they were first installed; probably 
whatever would have been stable/popular around December 2010.

Command is simply “power-off snm [1-3]” then “power-on snm [1-3]”.

Note that the power-on process causes your management session to hang for a few 
seconds. The device isn’t broken and packets aren’t getting dropped; it’s just 
going through checks and echoing back status.

-brad



 On Feb 16, 2015, at 7:07 AM, Jethro R Binks jethro.bi...@strath.ac.uk 
 mailto:jethro.bi...@strath.ac.uk  wrote:

 On Fri, 13 Feb 2015, Brad Fleming wrote:

 Over the years we’ve seen odd issues where one of the
 switch-fabric-links will “wigout” and some of the data moving between
 cards will get corrupted. When this happens we power cycle each switch
 fab one at a time using this process:

 1) Shutdown SFM #3
 2) Wait 1 minute
 3) Power SFM #3 on again
 4) Verify all SFM links are up to SFM#3
 5) Wait 1 minute
 6) Perform steps 1-5 for SFM #2
 7) Perform steps 1-5 for SFM #3

 Not sure you’re seeing the same issue that we see but the “SFM Dance”
 (as we call it) is a once-every-four-months thing somewhere across our
 16 XMR4000 boxes. It can be done with little to no impact if you are
 patient verify status before moving to the next SFM.

 That's all interesting.  What code versions is this?  Also, how do you
 shutdown the SFMs?  I don't recall seeing documentation for that.

 Jethro.



 On Feb 13, 2015, at 11:41 AM, net...@gmail.com mailto:net...@gmail.com  
 wrote:

 We have three switch fabrics installed, all are under 1% utilized.


 From: Jeroen Wunnink | Hibernia Networks [mailto:jeroen.wunn...@atrato.com 
 mailto:jeroen.wunn...@atrato.com  mailto:jeroen.wunn...@atrato.com 
 mailto:jeroen.wunn...@atrato.com ]
 Sent: Friday, February 13, 2015 12:27 PM
 To: net...@gmail.com mailto:net...@gmail.com  mailto:net...@gmail.com 
 mailto:net...@gmail.com ; 'Jeroen Wunnink | Hibernia Networks'
 Subject: Re: [f-nsp] MLX throughput issues

 How many switchfabrics do you have in that MLX and how high is the 
 utilization on them

 On 13/02/15 18:12, net...@gmail.com mailto:net...@gmail.com  
 mailto:net...@gmail.com mailto:net...@gmail.com  wrote:
 We also tested with a spare Quanta LB4M we have and are seeing about the 
 same speeds as we are seeing with the FLS648 (around 20MB/s or 160Mbps).

 I also reduced the number of routes we are accepting down to about 189K 
 and that did not make a difference.


 From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net 
 mailto:foundry-nsp-boun...@puck.nether.net  
 mailto:foundry-nsp-boun...@puck.nether.net 
 mailto:foundry-nsp-boun...@puck.nether.net ] On Behalf Of Jeroen 
 Wunnink | Hibernia Networks
 Sent: Friday, February 13, 2015 3:35 AM
 To: foundry-nsp@puck.nether.net mailto:foundry-nsp@puck.nether.net  
 mailto:foundry-nsp@puck.nether.net mailto:foundry-nsp@puck.nether.net 
 Subject: Re: [f-nsp] MLX throughput issues

 The FLS switches do something weird with packets. I've noticed they 
 somehow interfere with changing the MSS window size dynamically, resulting 
 in destinations further away having very poor speed results compared to 
 destinations close by.

 We got rid of those a while ago.


 On 12/02/15 17:37, net...@gmail.com mailto:net...@gmail.com  
 mailto:net...@gmail.com mailto:net...@gmail.com  wrote:
 We are having a strange issue on our MLX running code 5.6.00c.  We are 
 encountering some throughput issues that seem to be randomly impacting 
 specific networks.

 We use the MLX to handle both external BGP and internal

Re: [f-nsp] MLX throughput issues

2015-02-17 Thread Jethro R Binks
On Mon, 16 Feb 2015, Brad Fleming wrote:

 We’ve seen it since installing the high-capacity switch fabrics into our 
 XMR4000 chassis roughly 4 years ago. We saw it through IronWare 5.4.00d. 
 I’m not sure what software we were using when they were first installed; 
 probably whatever would have been stable/popular around December 2010.
 
 Command is simply “power-off snm [1-3]” then “power-on snm [1-3]”.

Ah I see it ... I was looking for SFMs not SNMs!

I also echo the poster's questions about how you notice the corruption.  
I have a suspicion I may be seeing similar things; particularly so with 
UDP-based transactions like NTP and RADIUS which could pass through such a 
chassis.  But I also suffer on CPU spikes with mcast traffic on that 
chassis too which has always been an issue for me.

Thanks.

Jethro.



 
 Note that the power-on process causes your management session to hang 
 for a few seconds. The device isn’t broken and packets aren’t getting 
 dropped; it’s just going through checks and echoing back status.
 
 -brad
 
 
  On Feb 16, 2015, at 7:07 AM, Jethro R Binks jethro.bi...@strath.ac.uk 
  wrote:
  
  On Fri, 13 Feb 2015, Brad Fleming wrote:
  
  Over the years we’ve seen odd issues where one of the 
  switch-fabric-links will “wigout” and some of the data moving between 
  cards will get corrupted. When this happens we power cycle each switch 
  fab one at a time using this process:
  
  1) Shutdown SFM #3
  2) Wait 1 minute
  3) Power SFM #3 on again
  4) Verify all SFM links are up to SFM#3
  5) Wait 1 minute
  6) Perform steps 1-5 for SFM #2
  7) Perform steps 1-5 for SFM #3
  
  Not sure you’re seeing the same issue that we see but the “SFM Dance” 
  (as we call it) is a once-every-four-months thing somewhere across our 
  16 XMR4000 boxes. It can be done with little to no impact if you are 
  patient verify status before moving to the next SFM.
  
  That's all interesting.  What code versions is this?  Also, how do you 
  shutdown the SFMs?  I don't recall seeing documentation for that.
  
  Jethro.
  
  
  
  On Feb 13, 2015, at 11:41 AM, net...@gmail.com wrote:
  
  We have three switch fabrics installed, all are under 1% utilized.
  
  
  From: Jeroen Wunnink | Hibernia Networks 
  [mailto:jeroen.wunn...@atrato.com mailto:jeroen.wunn...@atrato.com] 
  Sent: Friday, February 13, 2015 12:27 PM
  To: net...@gmail.com mailto:net...@gmail.com; 'Jeroen Wunnink | 
  Hibernia Networks'
  Subject: Re: [f-nsp] MLX throughput issues
  
  How many switchfabrics do you have in that MLX and how high is the 
  utilization on them
  
  On 13/02/15 18:12, net...@gmail.com mailto:net...@gmail.com wrote:
  We also tested with a spare Quanta LB4M we have and are seeing about the 
  same speeds as we are seeing with the FLS648 (around 20MB/s or 160Mbps).
  
  I also reduced the number of routes we are accepting down to about 189K 
  and that did not make a difference.
  
  
  From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net 
  mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of Jeroen 
  Wunnink | Hibernia Networks
  Sent: Friday, February 13, 2015 3:35 AM
  To: foundry-nsp@puck.nether.net mailto:foundry-nsp@puck.nether.net
  Subject: Re: [f-nsp] MLX throughput issues
  
  The FLS switches do something weird with packets. I've noticed they 
  somehow interfere with changing the MSS window size dynamically, 
  resulting in destinations further away having very poor speed results 
  compared to destinations close by. 
  
  We got rid of those a while ago.
  
  
  On 12/02/15 17:37, net...@gmail.com mailto:net...@gmail.com wrote:
  We are having a strange issue on our MLX running code 5.6.00c.  We are 
  encountering some throughput issues that seem to be randomly impacting 
  specific networks.
  
  We use the MLX to handle both external BGP and internal VLAN routing.  
  Each FLS648 is used for Layer 2 VLANs only.
  
  From a server connected by 1 Gbps uplink to a Foundry FLS648 switch, 
  which is then connected to the MLX on a 10 Gbps port, running a speed 
  test to an external network is getting 20MB/s.
  
  Connecting the same server directly to the MLX is getting 70MB/s.
  
  Connecting the same server to one of my customer's Juniper EX3200 
  (which BGP peers with the MLX) also gets 70MB/s.
  
  Testing to another external network, all three scenarios get 110MB/s.
  
  The path to both test network locations goes through the same IP 
  transit provider.
  
  We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the 
  Foundry FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly 
  connecting the server.  A separate NI-MLX-10Gx4 connects to our 
  upstream BGP providers.  Customer’s Juniper EX3200 connects to the same 
  NI-MLX-10Gx4 as the FLS648.  We take default routes plus full tables 
  from three providers by BGP, but filter out most of the routes.
  
  The fiber and optics on everything look fine.  CPU usage is less than 
  10% on the MLX and all line

Re: [f-nsp] MLX throughput issues

2015-02-16 Thread Josh Galvez
Why kind of wigout? And how do you diagnose the corruption?  I'm intrigued.

On Mon, Feb 16, 2015 at 8:43 AM, Brad Fleming bdfle...@gmail.com wrote:

 We’ve seen it since installing the high-capacity switch fabrics into our
 XMR4000 chassis roughly 4 years ago. We saw it through IronWare 5.4.00d.
 I’m not sure what software we were using when they were first installed;
 probably whatever would have been stable/popular around December 2010.

 Command is simply “power-off snm [1-3]” then “power-on snm [1-3]”.

 Note that the power-on process causes your management session to hang for
 a few seconds. The device isn’t broken and packets aren’t getting dropped;
 it’s just going through checks and echoing back status.

 -brad


  On Feb 16, 2015, at 7:07 AM, Jethro R Binks jethro.bi...@strath.ac.uk
 wrote:
 
  On Fri, 13 Feb 2015, Brad Fleming wrote:
 
  Over the years we’ve seen odd issues where one of the
  switch-fabric-links will “wigout” and some of the data moving between
  cards will get corrupted. When this happens we power cycle each switch
  fab one at a time using this process:
 
  1) Shutdown SFM #3
  2) Wait 1 minute
  3) Power SFM #3 on again
  4) Verify all SFM links are up to SFM#3
  5) Wait 1 minute
  6) Perform steps 1-5 for SFM #2
  7) Perform steps 1-5 for SFM #3
 
  Not sure you’re seeing the same issue that we see but the “SFM Dance”
  (as we call it) is a once-every-four-months thing somewhere across our
  16 XMR4000 boxes. It can be done with little to no impact if you are
  patient verify status before moving to the next SFM.
 
  That's all interesting.  What code versions is this?  Also, how do you
  shutdown the SFMs?  I don't recall seeing documentation for that.
 
  Jethro.
 
 
 
  On Feb 13, 2015, at 11:41 AM, net...@gmail.com wrote:
 
  We have three switch fabrics installed, all are under 1% utilized.
 
 
  From: Jeroen Wunnink | Hibernia Networks [mailto:
 jeroen.wunn...@atrato.com mailto:jeroen.wunn...@atrato.com]
  Sent: Friday, February 13, 2015 12:27 PM
  To: net...@gmail.com mailto:net...@gmail.com; 'Jeroen Wunnink |
 Hibernia Networks'
  Subject: Re: [f-nsp] MLX throughput issues
 
  How many switchfabrics do you have in that MLX and how high is the
 utilization on them
 
  On 13/02/15 18:12, net...@gmail.com mailto:net...@gmail.com wrote:
  We also tested with a spare Quanta LB4M we have and are seeing about
 the same speeds as we are seeing with the FLS648 (around 20MB/s or 160Mbps).
 
  I also reduced the number of routes we are accepting down to about
 189K and that did not make a difference.
 
 
  From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net
 mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of Jeroen Wunnink
 | Hibernia Networks
  Sent: Friday, February 13, 2015 3:35 AM
  To: foundry-nsp@puck.nether.net mailto:foundry-nsp@puck.nether.net
  Subject: Re: [f-nsp] MLX throughput issues
 
  The FLS switches do something weird with packets. I've noticed they
 somehow interfere with changing the MSS window size dynamically, resulting
 in destinations further away having very poor speed results compared to
 destinations close by.
 
  We got rid of those a while ago.
 
 
  On 12/02/15 17:37, net...@gmail.com mailto:net...@gmail.com wrote:
  We are having a strange issue on our MLX running code 5.6.00c.  We
 are encountering some throughput issues that seem to be randomly impacting
 specific networks.
 
  We use the MLX to handle both external BGP and internal VLAN
 routing.  Each FLS648 is used for Layer 2 VLANs only.
 
  From a server connected by 1 Gbps uplink to a Foundry FLS648 switch,
 which is then connected to the MLX on a 10 Gbps port, running a speed test
 to an external network is getting 20MB/s.
 
  Connecting the same server directly to the MLX is getting 70MB/s.
 
  Connecting the same server to one of my customer's Juniper EX3200
 (which BGP peers with the MLX) also gets 70MB/s.
 
  Testing to another external network, all three scenarios get 110MB/s.
 
  The path to both test network locations goes through the same IP
 transit provider.
 
  We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the
 Foundry FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly
 connecting the server.  A separate NI-MLX-10Gx4 connects to our upstream
 BGP providers.  Customer’s Juniper EX3200 connects to the same NI-MLX-10Gx4
 as the FLS648.  We take default routes plus full tables from three
 providers by BGP, but filter out most of the routes.
 
  The fiber and optics on everything look fine.  CPU usage is less
 than 10% on the MLX and all line cards and CPU usage at 1% on the FLS648.
 ARP table on the MLX is about 12K, and BGP table is about 308K routes.
 
  Any assistance would be appreciated.  I suspect there is a setting
 that we’re missing on the MLX that is causing this issue.
 
 
 
 
  ___
  foundry-nsp mailing list
  foundry-nsp@puck.nether.net mailto:foundry-nsp@puck.nether.net
  http

Re: [f-nsp] MLX throughput issues

2015-02-16 Thread Jethro R Binks
On Fri, 13 Feb 2015, Brad Fleming wrote:

 Over the years we’ve seen odd issues where one of the 
 switch-fabric-links will “wigout” and some of the data moving between 
 cards will get corrupted. When this happens we power cycle each switch 
 fab one at a time using this process:

 1) Shutdown SFM #3
 2) Wait 1 minute
 3) Power SFM #3 on again
 4) Verify all SFM links are up to SFM#3
 5) Wait 1 minute
 6) Perform steps 1-5 for SFM #2
 7) Perform steps 1-5 for SFM #3

 Not sure you’re seeing the same issue that we see but the “SFM Dance” 
 (as we call it) is a once-every-four-months thing somewhere across our 
 16 XMR4000 boxes. It can be done with little to no impact if you are 
 patient verify status before moving to the next SFM.

That's all interesting.  What code versions is this?  Also, how do you 
shutdown the SFMs?  I don't recall seeing documentation for that.

Jethro.


 
  On Feb 13, 2015, at 11:41 AM, net...@gmail.com wrote:
  
  We have three switch fabrics installed, all are under 1% utilized.
   
   
  From: Jeroen Wunnink | Hibernia Networks [mailto:jeroen.wunn...@atrato.com 
  mailto:jeroen.wunn...@atrato.com] 
  Sent: Friday, February 13, 2015 12:27 PM
  To: net...@gmail.com mailto:net...@gmail.com; 'Jeroen Wunnink | Hibernia 
  Networks'
  Subject: Re: [f-nsp] MLX throughput issues
   
  How many switchfabrics do you have in that MLX and how high is the 
  utilization on them
  
  On 13/02/15 18:12, net...@gmail.com mailto:net...@gmail.com wrote:
  We also tested with a spare Quanta LB4M we have and are seeing about the 
  same speeds as we are seeing with the FLS648 (around 20MB/s or 160Mbps).
   
  I also reduced the number of routes we are accepting down to about 189K 
  and that did not make a difference.
   
   
  From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net 
  mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of Jeroen Wunnink 
  | Hibernia Networks
  Sent: Friday, February 13, 2015 3:35 AM
  To: foundry-nsp@puck.nether.net mailto:foundry-nsp@puck.nether.net
  Subject: Re: [f-nsp] MLX throughput issues
   
  The FLS switches do something weird with packets. I've noticed they 
  somehow interfere with changing the MSS window size dynamically, resulting 
  in destinations further away having very poor speed results compared to 
  destinations close by. 
  
  We got rid of those a while ago.
  
  
  On 12/02/15 17:37, net...@gmail.com mailto:net...@gmail.com wrote:
  We are having a strange issue on our MLX running code 5.6.00c.  We are 
  encountering some throughput issues that seem to be randomly impacting 
  specific networks.
   
  We use the MLX to handle both external BGP and internal VLAN routing.  
  Each FLS648 is used for Layer 2 VLANs only.
   
  From a server connected by 1 Gbps uplink to a Foundry FLS648 switch, 
  which is then connected to the MLX on a 10 Gbps port, running a speed 
  test to an external network is getting 20MB/s.
   
  Connecting the same server directly to the MLX is getting 70MB/s.
   
  Connecting the same server to one of my customer's Juniper EX3200 (which 
  BGP peers with the MLX) also gets 70MB/s.
   
  Testing to another external network, all three scenarios get 110MB/s.
   
  The path to both test network locations goes through the same IP transit 
  provider.
   
  We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the 
  Foundry FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly 
  connecting the server.  A separate NI-MLX-10Gx4 connects to our upstream 
  BGP providers.  Customer’s Juniper EX3200 connects to the same 
  NI-MLX-10Gx4 as the FLS648.  We take default routes plus full tables from 
  three providers by BGP, but filter out most of the routes.
   
  The fiber and optics on everything look fine.  CPU usage is less than 10% 
  on the MLX and all line cards and CPU usage at 1% on the FLS648.  ARP 
  table on the MLX is about 12K, and BGP table is about 308K routes.
   
  Any assistance would be appreciated.  I suspect there is a setting that 
  we’re missing on the MLX that is causing this issue.
  
  
  
  
  ___
  foundry-nsp mailing list
  foundry-nsp@puck.nether.net mailto:foundry-nsp@puck.nether.net
  http://puck.nether.net/mailman/listinfo/foundry-nsp 
  http://puck.nether.net/mailman/listinfo/foundry-nsp
  
  
  
  -- 
   
  Jeroen Wunnink
  IP NOC Manager - Hibernia Networks
  Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300 
  Netherlands +31.208.200.622 | 24/7 IP NOC Phone: +31.20.82.00.623
  jeroen.wunn...@hibernianetworks.com 
  mailto:jeroen.wunn...@hibernianetworks.com
  www.hibernianetworks.com http://www.hibernianetworks.com/
  
  
  -- 
   
  Jeroen Wunnink
  IP NOC Manager - Hibernia Networks
  Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300 
  Netherlands +31.208.200.622 | 24/7 IP NOC Phone: +31.20.82.00.623
  jeroen.wunn...@hibernianetworks.com 
  mailto:jeroen.wunn

Re: [f-nsp] MLX throughput issues

2015-02-13 Thread Chris Hellkvist
Hey,

this sounds like a good tip. We are seeing a issue very similar to the one
reported in this thread.
Speed to local servers is fine, but to remote servers the speed decreases
depending on the latency to them (while having no overloaded links or
something like that).
In our case the core devices are also MLX(e) boxes, but the servers are not
directly terminating in the MLX, the path to them includes a Cisco with
sup720 for routing and HP switching stuff on the L2 path to the servers.
Jeroen, could you share a bit more insight on the issue you had with
dynamic MSS adjustments? Have you been able to find a way to change the
behaviour of the switches? Have you seen such a issue also with your other
Brocade equipment you have at Hibernia?

Thanks,
Chris

Am Freitag, 13. Februar 2015 schrieb Jeroen Wunnink | Hibernia Networks :

  The FLS switches do something weird with packets. I've noticed they
 somehow interfere with changing the MSS window size dynamically, resulting
 in destinations further away having very poor speed results compared to
 destinations close by.

 We got rid of those a while ago.


 On 12/02/15 17:37, net...@gmail.com
 javascript:_e(%7B%7D,'cvml','net...@gmail.com'); wrote:

  We are having a strange issue on our MLX running code 5.6.00c.  We are
 encountering some throughput issues that seem to be randomly impacting
 specific networks.



 We use the MLX to handle both external BGP and internal VLAN routing.
 Each FLS648 is used for Layer 2 VLANs only.



 From a server connected by 1 Gbps uplink to a Foundry FLS648 switch, which
 is then connected to the MLX on a 10 Gbps port, running a speed test to an
 external network is getting 20MB/s.



 Connecting the same server directly to the MLX is getting 70MB/s.



 Connecting the same server to one of my customer's Juniper EX3200 (which
 BGP peers with the MLX) also gets 70MB/s.



 Testing to another external network, all three scenarios get 110MB/s.



 The path to both test network locations goes through the same IP transit
 provider.



 We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the Foundry
 FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly connecting the
 server.  A separate NI-MLX-10Gx4 connects to our upstream BGP providers.
 Customer’s Juniper EX3200 connects to the same NI-MLX-10Gx4 as the FLS648.
 We take default routes plus full tables from three providers by BGP, but
 filter out most of the routes.



 The fiber and optics on everything look fine.  CPU usage is less than 10%
 on the MLX and all line cards and CPU usage at 1% on the FLS648.  ARP table
 on the MLX is about 12K, and BGP table is about 308K routes.



 Any assistance would be appreciated.  I suspect there is a setting that
 we’re missing on the MLX that is causing this issue.


 ___
 foundry-nsp mailing listfoundry-...@puck.nether.net 
 javascript:_e(%7B%7D,'cvml','foundry-nsp@puck.nether.net');http://puck.nether.net/mailman/listinfo/foundry-nsp



 --

 Jeroen Wunnink
 IP NOC Manager - Hibernia Networks
 Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300
 Netherlands +31.208.200.622 | 24/7 IP NOC Phone: 
 +31.20.82.00.623jeroen.wunn...@hibernianetworks.com 
 javascript:_e(%7B%7D,'cvml','jeroen.wunn...@hibernianetworks.com');www.hibernianetworks.com


___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

Re: [f-nsp] MLX throughput issues

2015-02-13 Thread Jeroen Wunnink | Hibernia Networks
The FLS switches do something weird with packets. I've noticed they 
somehow interfere with changing the MSS window size dynamically, 
resulting in destinations further away having very poor speed results 
compared to destinations close by.


We got rid of those a while ago.


On 12/02/15 17:37, net...@gmail.com wrote:


We are having a strange issue on our MLX running code 5.6.00c.  We are 
encountering some throughput issues that seem to be randomly impacting 
specific networks.


We use the MLX to handle both external BGP and internal VLAN routing.  
Each FLS648 is used for Layer 2 VLANs only.


From a server connected by 1 Gbps uplink to a Foundry FLS648 switch, 
which is then connected to the MLX on a 10 Gbps port, running a speed 
test to an external network is getting 20MB/s.


Connecting the same server directly to the MLX is getting 70MB/s.

Connecting the same server to one of my customer's Juniper EX3200 
(which BGP peers with the MLX) also gets 70MB/s.


Testing to another external network, all three scenarios get 110MB/s.

The path to both test network locations goes through the same IP 
transit provider.


We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the 
Foundry FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly 
connecting the server. A separate NI-MLX-10Gx4 connects to our 
upstream BGP providers.  Customer’s Juniper EX3200 connects to the 
same NI-MLX-10Gx4 as the FLS648.  We take default routes plus full 
tables from three providers by BGP, but filter out most of the routes.


The fiber and optics on everything look fine.  CPU usage is less than 
10% on the MLX and all line cards and CPU usage at 1% on the FLS648.  
ARP table on the MLX is about 12K, and BGP table is about 308K routes.


Any assistance would be appreciated.  I suspect there is a setting 
that we’re missing on the MLX that is causing this issue.




___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp



--

Jeroen Wunnink
IP NOC Manager - Hibernia Networks
Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300
Netherlands +31.208.200.622 | 24/7 IP NOC Phone: +31.20.82.00.623
jeroen.wunn...@hibernianetworks.com
www.hibernianetworks.com

___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

Re: [f-nsp] MLX throughput issues

2015-02-13 Thread nethub
We are not using LAG anywhere in our network.

 

 

From: G B [mailto:geor...@gmail.com] 
Sent: Friday, February 13, 2015 1:16 AM
To: net...@gmail.com
Cc: Niels Bakker; foundry-nsp
Subject: Re: [f-nsp] MLX throughput issues

 

Wondering if you might have some imbalance in a LAG somewhere.  Where it is 
hashing too much traffic to one link of a lag.  By default it uses the mac 
address of the next layer 2 hop and traffic going to a gateway will all hash to 
the same link.  Are there any LAGs involved?  Is there a major imbalance of 
traffic on a LAG in the traffic path?

 

On Thu, Feb 12, 2015 at 9:43 PM, net...@gmail.com wrote:

We are only accepting about 300k IPv4 routes currently (we filter to reduce
the table size).  We are on the multi-service-2 CAM partition profile and we
have the system-max values for ip-route and ip-cache set to 445K.

Also, we upgraded to 5.6f today to see if that would help but it did not
change anything.

CPU usage is very low across the board (under 10% use on everything), so if
it is routing in software, it isn't causing a jump in CPU load.


-Original Message-
From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of

Niels Bakker
Sent: Thursday, February 12, 2015 8:38 PM
To: foundry-nsp@puck.nether.net
Subject: Re: [f-nsp] MLX throughput issues

* net...@gmail.com (net...@gmail.com) [Fri 13 Feb 2015, 01:45 CET]:
As I stated in the first message, the Juniper EX3200 is a downstream
BGP customer that is single homed to our network, so it is on a
different ASN and the communication between my network and his network
is layer 3.

Are you running that MLX with a full BGP table?  20 MB/sec sounds like
you're forwarding packets over its CPU, perhaps because it ran out of CAM
space.


-- Niels.

--
___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

 

___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

Re: [f-nsp] MLX throughput issues

2015-02-13 Thread nethub
We have three switch fabrics installed, all are under 1% utilized.

 

 

From: Jeroen Wunnink | Hibernia Networks [mailto:jeroen.wunn...@atrato.com] 
Sent: Friday, February 13, 2015 12:27 PM
To: net...@gmail.com; 'Jeroen Wunnink | Hibernia Networks'
Subject: Re: [f-nsp] MLX throughput issues

 

How many switchfabrics do you have in that MLX and how high is the
utilization on them

On 13/02/15 18:12, net...@gmail.com wrote:

We also tested with a spare Quanta LB4M we have and are seeing about the
same speeds as we are seeing with the FLS648 (around 20MB/s or 160Mbps).

 

I also reduced the number of routes we are accepting down to about 189K and
that did not make a difference.

 

 

From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of
Jeroen Wunnink | Hibernia Networks
Sent: Friday, February 13, 2015 3:35 AM
To: foundry-nsp@puck.nether.net
Subject: Re: [f-nsp] MLX throughput issues

 

The FLS switches do something weird with packets. I've noticed they somehow
interfere with changing the MSS window size dynamically, resulting in
destinations further away having very poor speed results compared to
destinations close by. 

We got rid of those a while ago.


On 12/02/15 17:37, net...@gmail.com wrote:

We are having a strange issue on our MLX running code 5.6.00c.  We are
encountering some throughput issues that seem to be randomly impacting
specific networks.

 

We use the MLX to handle both external BGP and internal VLAN routing.  Each
FLS648 is used for Layer 2 VLANs only.

 

From a server connected by 1 Gbps uplink to a Foundry FLS648 switch, which
is then connected to the MLX on a 10 Gbps port, running a speed test to an
external network is getting 20MB/s.

 

Connecting the same server directly to the MLX is getting 70MB/s.

 

Connecting the same server to one of my customer's Juniper EX3200 (which BGP
peers with the MLX) also gets 70MB/s.

 

Testing to another external network, all three scenarios get 110MB/s.

 

The path to both test network locations goes through the same IP transit
provider.

 

We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the Foundry
FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly connecting the
server.  A separate NI-MLX-10Gx4 connects to our upstream BGP providers.
Customer's Juniper EX3200 connects to the same NI-MLX-10Gx4 as the FLS648.
We take default routes plus full tables from three providers by BGP, but
filter out most of the routes.

 

The fiber and optics on everything look fine.  CPU usage is less than 10% on
the MLX and all line cards and CPU usage at 1% on the FLS648.  ARP table on
the MLX is about 12K, and BGP table is about 308K routes.

 

Any assistance would be appreciated.  I suspect there is a setting that
we're missing on the MLX that is causing this issue.







___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp







-- 
 
Jeroen Wunnink
IP NOC Manager - Hibernia Networks
Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300 
Netherlands +31.208.200.622 | 24/7 IP NOC Phone: +31.20.82.00.623
jeroen.wunn...@hibernianetworks.com
www.hibernianetworks.com






-- 
 
Jeroen Wunnink
IP NOC Manager - Hibernia Networks
Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300 
Netherlands +31.208.200.622 | 24/7 IP NOC Phone: +31.20.82.00.623
jeroen.wunn...@hibernianetworks.com
www.hibernianetworks.com
___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

Re: [f-nsp] MLX throughput issues

2015-02-13 Thread nethub
We already tried a full system reboot last night and it didn’t seem to help.  
I’ll definitely keep your switch fabric reboot procedure in mind in case we run 
into that in the future.

 

I think we may have figured out at least a short-term solution.  On the FLS648, 
we ran the command “buffer-sharing-full” and immediately we were able to get 
better speeds.  It seems as though the FLS648’s buffers may have been causing 
the issue.  We’ll continue to monitor over the next few days and see if this 
actually solves the issue.

 

Thanks everyone for your feedback thus far.

 

 

 

From: Brad Fleming [mailto:bdfle...@gmail.com] 
Sent: Friday, February 13, 2015 4:24 PM
To: net...@gmail.com
Cc: Jeroen Wunnink | Hibernia Networks; foundry-nsp@puck.nether.net
Subject: Re: [f-nsp] MLX throughput issues

 

Over the years we’ve seen odd issues where one of the switch-fabric-links will 
“wigout” and some of the data moving between cards will get corrupted. When 
this happens we power cycle each switch fab one at a time using this process:

1) Shutdown SFM #3

2) Wait 1 minute

3) Power SFM #3 on again

4) Verify all SFM links are up to SFM#3

5) Wait 1 minute

6) Perform steps 1-5 for SFM #2

7) Perform steps 1-5 for SFM #3

 

Not sure you’re seeing the same issue that we see but the “SFM Dance” (as we 
call it) is a once-every-four-months thing somewhere across our 16 XMR4000 
boxes. It can be done with little to no impact if you are patient verify status 
before moving to the next SFM.

 

 

On Feb 13, 2015, at 11:41 AM, net...@gmail.com wrote:

 

We have three switch fabrics installed, all are under 1% utilized.

 

 

From: Jeroen Wunnink | Hibernia Networks [ mailto:jeroen.wunn...@atrato.com 
mailto:jeroen.wunn...@atrato.com] 
Sent: Friday, February 13, 2015 12:27 PM
To:  mailto:net...@gmail.com net...@gmail.com; 'Jeroen Wunnink | Hibernia 
Networks'
Subject: Re: [f-nsp] MLX throughput issues

 

How many switchfabrics do you have in that MLX and how high is the utilization 
on them

On 13/02/15 18:12,  mailto:net...@gmail.com net...@gmail.com wrote:

We also tested with a spare Quanta LB4M we have and are seeing about the same 
speeds as we are seeing with the FLS648 (around 20MB/s or 160Mbps).

 

I also reduced the number of routes we are accepting down to about 189K and 
that did not make a difference.

 

 

From: foundry-nsp [ mailto:foundry-nsp-boun...@puck.nether.net 
mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of Jeroen Wunnink | 
Hibernia Networks
Sent: Friday, February 13, 2015 3:35 AM
To:  mailto:foundry-nsp@puck.nether.net foundry-nsp@puck.nether.net
Subject: Re: [f-nsp] MLX throughput issues

 

The FLS switches do something weird with packets. I've noticed they somehow 
interfere with changing the MSS window size dynamically, resulting in 
destinations further away having very poor speed results compared to 
destinations close by. 

We got rid of those a while ago.


On 12/02/15 17:37,  mailto:net...@gmail.com net...@gmail.com wrote:

We are having a strange issue on our MLX running code 5.6.00c.  We are 
encountering some throughput issues that seem to be randomly impacting specific 
networks.

 

We use the MLX to handle both external BGP and internal VLAN routing.  Each 
FLS648 is used for Layer 2 VLANs only.

 

From a server connected by 1 Gbps uplink to a Foundry FLS648 switch, which is 
then connected to the MLX on a 10 Gbps port, running a speed test to an 
external network is getting 20MB/s.

 

Connecting the same server directly to the MLX is getting 70MB/s.

 

Connecting the same server to one of my customer's Juniper EX3200 (which BGP 
peers with the MLX) also gets 70MB/s.

 

Testing to another external network, all three scenarios get 110MB/s.

 

The path to both test network locations goes through the same IP transit 
provider.

 

We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the Foundry 
FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly connecting the 
server.  A separate NI-MLX-10Gx4 connects to our upstream BGP providers.  
Customer’s Juniper EX3200 connects to the same NI-MLX-10Gx4 as the FLS648.  We 
take default routes plus full tables from three providers by BGP, but filter 
out most of the routes.

 

The fiber and optics on everything look fine.  CPU usage is less than 10% on 
the MLX and all line cards and CPU usage at 1% on the FLS648.  ARP table on the 
MLX is about 12K, and BGP table is about 308K routes.

 

Any assistance would be appreciated.  I suspect there is a setting that we’re 
missing on the MLX that is causing this issue.








___
foundry-nsp mailing list
 mailto:foundry-nsp@puck.nether.net foundry-nsp@puck.nether.net
 http://puck.nether.net/mailman/listinfo/foundry-nsp 
http://puck.nether.net/mailman/listinfo/foundry-nsp








-- 
 
Jeroen Wunnink
IP NOC Manager - Hibernia Networks
Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300 
Netherlands

Re: [f-nsp] MLX throughput issues

2015-02-13 Thread nethub
I also tested with an FESX448 and got the same results as the FLS648 and
Quanta LB4M switches.

 

 

From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of
Jeroen Wunnink | Hibernia Networks
Sent: Friday, February 13, 2015 3:35 AM
To: foundry-nsp@puck.nether.net
Subject: Re: [f-nsp] MLX throughput issues

 

The FLS switches do something weird with packets. I've noticed they somehow
interfere with changing the MSS window size dynamically, resulting in
destinations further away having very poor speed results compared to
destinations close by. 

We got rid of those a while ago.


On 12/02/15 17:37, net...@gmail.com wrote:

We are having a strange issue on our MLX running code 5.6.00c.  We are
encountering some throughput issues that seem to be randomly impacting
specific networks.

 

We use the MLX to handle both external BGP and internal VLAN routing.  Each
FLS648 is used for Layer 2 VLANs only.

 

From a server connected by 1 Gbps uplink to a Foundry FLS648 switch, which
is then connected to the MLX on a 10 Gbps port, running a speed test to an
external network is getting 20MB/s.

 

Connecting the same server directly to the MLX is getting 70MB/s.

 

Connecting the same server to one of my customer's Juniper EX3200 (which BGP
peers with the MLX) also gets 70MB/s.

 

Testing to another external network, all three scenarios get 110MB/s.

 

The path to both test network locations goes through the same IP transit
provider.

 

We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the Foundry
FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly connecting the
server.  A separate NI-MLX-10Gx4 connects to our upstream BGP providers.
Customer's Juniper EX3200 connects to the same NI-MLX-10Gx4 as the FLS648.
We take default routes plus full tables from three providers by BGP, but
filter out most of the routes.

 

The fiber and optics on everything look fine.  CPU usage is less than 10% on
the MLX and all line cards and CPU usage at 1% on the FLS648.  ARP table on
the MLX is about 12K, and BGP table is about 308K routes.

 

Any assistance would be appreciated.  I suspect there is a setting that
we're missing on the MLX that is causing this issue.






___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp






-- 
 
Jeroen Wunnink
IP NOC Manager - Hibernia Networks
Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300 
Netherlands +31.208.200.622 | 24/7 IP NOC Phone: +31.20.82.00.623
jeroen.wunn...@hibernianetworks.com
www.hibernianetworks.com
___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

Re: [f-nsp] MLX throughput issues

2015-02-13 Thread Brad Fleming
Over the years we’ve seen odd issues where one of the switch-fabric-links will 
“wigout” and some of the data moving between cards will get corrupted. When 
this happens we power cycle each switch fab one at a time using this process:
1) Shutdown SFM #3
2) Wait 1 minute
3) Power SFM #3 on again
4) Verify all SFM links are up to SFM#3
5) Wait 1 minute
6) Perform steps 1-5 for SFM #2
7) Perform steps 1-5 for SFM #3

Not sure you’re seeing the same issue that we see but the “SFM Dance” (as we 
call it) is a once-every-four-months thing somewhere across our 16 XMR4000 
boxes. It can be done with little to no impact if you are patient verify status 
before moving to the next SFM.


 On Feb 13, 2015, at 11:41 AM, net...@gmail.com wrote:
 
 We have three switch fabrics installed, all are under 1% utilized.
  
  
 From: Jeroen Wunnink | Hibernia Networks [mailto:jeroen.wunn...@atrato.com 
 mailto:jeroen.wunn...@atrato.com] 
 Sent: Friday, February 13, 2015 12:27 PM
 To: net...@gmail.com mailto:net...@gmail.com; 'Jeroen Wunnink | Hibernia 
 Networks'
 Subject: Re: [f-nsp] MLX throughput issues
  
 How many switchfabrics do you have in that MLX and how high is the 
 utilization on them
 
 On 13/02/15 18:12, net...@gmail.com mailto:net...@gmail.com wrote:
 We also tested with a spare Quanta LB4M we have and are seeing about the 
 same speeds as we are seeing with the FLS648 (around 20MB/s or 160Mbps).
  
 I also reduced the number of routes we are accepting down to about 189K and 
 that did not make a difference.
  
  
 From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net 
 mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of Jeroen Wunnink | 
 Hibernia Networks
 Sent: Friday, February 13, 2015 3:35 AM
 To: foundry-nsp@puck.nether.net mailto:foundry-nsp@puck.nether.net
 Subject: Re: [f-nsp] MLX throughput issues
  
 The FLS switches do something weird with packets. I've noticed they somehow 
 interfere with changing the MSS window size dynamically, resulting in 
 destinations further away having very poor speed results compared to 
 destinations close by. 
 
 We got rid of those a while ago.
 
 
 On 12/02/15 17:37, net...@gmail.com mailto:net...@gmail.com wrote:
 We are having a strange issue on our MLX running code 5.6.00c.  We are 
 encountering some throughput issues that seem to be randomly impacting 
 specific networks.
  
 We use the MLX to handle both external BGP and internal VLAN routing.  Each 
 FLS648 is used for Layer 2 VLANs only.
  
 From a server connected by 1 Gbps uplink to a Foundry FLS648 switch, which 
 is then connected to the MLX on a 10 Gbps port, running a speed test to an 
 external network is getting 20MB/s.
  
 Connecting the same server directly to the MLX is getting 70MB/s.
  
 Connecting the same server to one of my customer's Juniper EX3200 (which 
 BGP peers with the MLX) also gets 70MB/s.
  
 Testing to another external network, all three scenarios get 110MB/s.
  
 The path to both test network locations goes through the same IP transit 
 provider.
  
 We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the Foundry 
 FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly connecting the 
 server.  A separate NI-MLX-10Gx4 connects to our upstream BGP providers.  
 Customer’s Juniper EX3200 connects to the same NI-MLX-10Gx4 as the FLS648.  
 We take default routes plus full tables from three providers by BGP, but 
 filter out most of the routes.
  
 The fiber and optics on everything look fine.  CPU usage is less than 10% 
 on the MLX and all line cards and CPU usage at 1% on the FLS648.  ARP table 
 on the MLX is about 12K, and BGP table is about 308K routes.
  
 Any assistance would be appreciated.  I suspect there is a setting that 
 we’re missing on the MLX that is causing this issue.
 
 
 
 
 ___
 foundry-nsp mailing list
 foundry-nsp@puck.nether.net mailto:foundry-nsp@puck.nether.net
 http://puck.nether.net/mailman/listinfo/foundry-nsp 
 http://puck.nether.net/mailman/listinfo/foundry-nsp
 
 
 
 -- 
  
 Jeroen Wunnink
 IP NOC Manager - Hibernia Networks
 Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300 
 Netherlands +31.208.200.622 | 24/7 IP NOC Phone: +31.20.82.00.623
 jeroen.wunn...@hibernianetworks.com 
 mailto:jeroen.wunn...@hibernianetworks.com
 www.hibernianetworks.com http://www.hibernianetworks.com/
 
 
 -- 
  
 Jeroen Wunnink
 IP NOC Manager - Hibernia Networks
 Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300 
 Netherlands +31.208.200.622 | 24/7 IP NOC Phone: +31.20.82.00.623
 jeroen.wunn...@hibernianetworks.com 
 mailto:jeroen.wunn...@hibernianetworks.com
 www.hibernianetworks.com 
 http://www.hibernianetworks.com/___
 foundry-nsp mailing list
 foundry-nsp@puck.nether.net mailto:foundry-nsp@puck.nether.net
 http://puck.nether.net/mailman/listinfo/foundry-nsp 
 http://puck.nether.net/mailman/listinfo/foundry-nsp

Re: [f-nsp] MLX throughput issues

2015-02-12 Thread nethub
Thanks for your response, Frank.

 

I do mean megabytes per second (i.e. 20MB/s = 160 Mbps, 70MB/s = 560 Mbps,
110MB/s = 880 Mbps).

 

I am thinking that the FLS648 switches are not likely responsible since I
was able to get 110MB/s to another external network with all three scenarios
(server to FLS648 to MLX, server to MLX direct, server to EX3200 to MLX).
The FLS648 is layer 2 only, so I don't see how it would be interfering with
the throughput to one network and not to another.  The problem is also
occurring on servers attached to multiple FLS648 that are each directly
connected to the MLX, so it is across different 10G cards, optics, slots on
the MLX chassis, etc.

 

The remote server doesn't seem to be having any issues since I was able to
get 70MB/s to it from connecting directly to the MLX and from connecting
through the EX3200.  It is only from behind the FLS648 that I run into
issues.

 

As I stated in the first message, the Juniper EX3200 is a downstream BGP
customer that is single homed to our network, so it is on a different ASN
and the communication between my network and his network is layer 3.

 

Any additional insight would be appreciated.

 

 

From: Frank Bulk [mailto:frnk...@iname.com] 
Sent: Thursday, February 12, 2015 6:48 PM
To: net...@gmail.com; foundry-nsp@puck.nether.net
Subject: RE: [f-nsp] MLX throughput issues

 

Based on what you described it seems more to be the case that the FLS648 is
dropping throughput from ~70 Mbps to 20 Mbps (I presume you mean bits, not
bytes when you write MB/s).

 

How do you know that the remote speed server is not maxed out?  Or that your
uplink is not maxed out?

 

Frank

 

From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of
net...@gmail.com
Sent: Thursday, February 12, 2015 11:38 AM
To: foundry-nsp@puck.nether.net
Subject: [f-nsp] MLX throughput issues

 

We are having a strange issue on our MLX running code 5.6.00c.  We are
encountering some throughput issues that seem to be randomly impacting
specific networks.

 

We use the MLX to handle both external BGP and internal VLAN routing.  Each
FLS648 is used for Layer 2 VLANs only.

 

From a server connected by 1 Gbps uplink to a Foundry FLS648 switch, which
is then connected to the MLX on a 10 Gbps port, running a speed test to an
external network is getting 20MB/s.

 

Connecting the same server directly to the MLX is getting 70MB/s.

 

Connecting the same server to one of my customer's Juniper EX3200 (which BGP
peers with the MLX) also gets 70MB/s.

 

Testing to another external network, all three scenarios get 110MB/s.

 

The path to both test network locations goes through the same IP transit
provider.

 

We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the Foundry
FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly connecting the
server.  A separate NI-MLX-10Gx4 connects to our upstream BGP providers.
Customer's Juniper EX3200 connects to the same NI-MLX-10Gx4 as the FLS648.
We take default routes plus full tables from three providers by BGP, but
filter out most of the routes.

 

The fiber and optics on everything look fine.  CPU usage is less than 10% on
the MLX and all line cards and CPU usage at 1% on the FLS648.  ARP table on
the MLX is about 12K, and BGP table is about 308K routes.

 

Any assistance would be appreciated.  I suspect there is a setting that
we're missing on the MLX that is causing this issue.

___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

Re: [f-nsp] MLX throughput issues

2015-02-12 Thread Frank Bulk
Based on what you described it seems more to be the case that the FLS648 is
dropping throughput from ~70 Mbps to 20 Mbps (I presume you mean bits, not
bytes when you write MB/s).

 

How do you know that the remote speed server is not maxed out?  Or that your
uplink is not maxed out?

 

Frank

 

From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of
net...@gmail.com
Sent: Thursday, February 12, 2015 11:38 AM
To: foundry-nsp@puck.nether.net
Subject: [f-nsp] MLX throughput issues

 

We are having a strange issue on our MLX running code 5.6.00c.  We are
encountering some throughput issues that seem to be randomly impacting
specific networks.

 

We use the MLX to handle both external BGP and internal VLAN routing.  Each
FLS648 is used for Layer 2 VLANs only.

 

From a server connected by 1 Gbps uplink to a Foundry FLS648 switch, which
is then connected to the MLX on a 10 Gbps port, running a speed test to an
external network is getting 20MB/s.

 

Connecting the same server directly to the MLX is getting 70MB/s.

 

Connecting the same server to one of my customer's Juniper EX3200 (which BGP
peers with the MLX) also gets 70MB/s.

 

Testing to another external network, all three scenarios get 110MB/s.

 

The path to both test network locations goes through the same IP transit
provider.

 

We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the Foundry
FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly connecting the
server.  A separate NI-MLX-10Gx4 connects to our upstream BGP providers.
Customer's Juniper EX3200 connects to the same NI-MLX-10Gx4 as the FLS648.
We take default routes plus full tables from three providers by BGP, but
filter out most of the routes.

 

The fiber and optics on everything look fine.  CPU usage is less than 10% on
the MLX and all line cards and CPU usage at 1% on the FLS648.  ARP table on
the MLX is about 12K, and BGP table is about 308K routes.

 

Any assistance would be appreciated.  I suspect there is a setting that
we're missing on the MLX that is causing this issue.

___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

Re: [f-nsp] MLX throughput issues

2015-02-12 Thread Niels Bakker

* net...@gmail.com (net...@gmail.com) [Fri 13 Feb 2015, 01:45 CET]:
As I stated in the first message, the Juniper EX3200 is a downstream 
BGP customer that is single homed to our network, so it is on a 
different ASN and the communication between my network and his 
network is layer 3.


Are you running that MLX with a full BGP table?  20 MB/sec sounds like 
you're forwarding packets over its CPU, perhaps because it ran out of 
CAM space.



-- Niels.

--
___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp


Re: [f-nsp] MLX throughput issues

2015-02-12 Thread nethub
We are only accepting about 300k IPv4 routes currently (we filter to reduce
the table size).  We are on the multi-service-2 CAM partition profile and we
have the system-max values for ip-route and ip-cache set to 445K.

Also, we upgraded to 5.6f today to see if that would help but it did not
change anything.

CPU usage is very low across the board (under 10% use on everything), so if
it is routing in software, it isn't causing a jump in CPU load.


-Original Message-
From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net] On Behalf Of
Niels Bakker
Sent: Thursday, February 12, 2015 8:38 PM
To: foundry-nsp@puck.nether.net
Subject: Re: [f-nsp] MLX throughput issues

* net...@gmail.com (net...@gmail.com) [Fri 13 Feb 2015, 01:45 CET]:
As I stated in the first message, the Juniper EX3200 is a downstream 
BGP customer that is single homed to our network, so it is on a 
different ASN and the communication between my network and his network 
is layer 3.

Are you running that MLX with a full BGP table?  20 MB/sec sounds like
you're forwarding packets over its CPU, perhaps because it ran out of CAM
space.


-- Niels.

--
___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp

___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp


Re: [f-nsp] MLX throughput issues

2015-02-12 Thread G B
Wondering if you might have some imbalance in a LAG somewhere.  Where it is
hashing too much traffic to one link of a lag.  By default it uses the mac
address of the next layer 2 hop and traffic going to a gateway will all
hash to the same link.  Are there any LAGs involved?  Is there a major
imbalance of traffic on a LAG in the traffic path?

On Thu, Feb 12, 2015 at 9:43 PM, net...@gmail.com wrote:

 We are only accepting about 300k IPv4 routes currently (we filter to reduce
 the table size).  We are on the multi-service-2 CAM partition profile and
 we
 have the system-max values for ip-route and ip-cache set to 445K.

 Also, we upgraded to 5.6f today to see if that would help but it did not
 change anything.

 CPU usage is very low across the board (under 10% use on everything), so if
 it is routing in software, it isn't causing a jump in CPU load.


 -Original Message-
 From: foundry-nsp [mailto:foundry-nsp-boun...@puck.nether.net] On Behalf
 Of
 Niels Bakker
 Sent: Thursday, February 12, 2015 8:38 PM
 To: foundry-nsp@puck.nether.net
 Subject: Re: [f-nsp] MLX throughput issues

 * net...@gmail.com (net...@gmail.com) [Fri 13 Feb 2015, 01:45 CET]:
 As I stated in the first message, the Juniper EX3200 is a downstream
 BGP customer that is single homed to our network, so it is on a
 different ASN and the communication between my network and his network
 is layer 3.

 Are you running that MLX with a full BGP table?  20 MB/sec sounds like
 you're forwarding packets over its CPU, perhaps because it ran out of CAM
 space.


 -- Niels.

 --
 ___
 foundry-nsp mailing list
 foundry-nsp@puck.nether.net
 http://puck.nether.net/mailman/listinfo/foundry-nsp

 ___
 foundry-nsp mailing list
 foundry-nsp@puck.nether.net
 http://puck.nether.net/mailman/listinfo/foundry-nsp

___
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp