Ou pourquoi déployer ipv6 n'est pas aussi simple qu'un 'modprobe ipv6'
ou 'ipv6 address 2001:db8::1/64' sur une interface.

tl;dr: corruption mémoire sur Cisco ME3600X si ipv6 et des egress ACL
sont utilisés en même temps. Workaround : désactiver ipv6.

---
petrus, qui en a marre d'entendre dire qu'activer ipv6 c'est facile,
et que tout le monde devrait l'avoir fait depuis 15 ans.


---------- Forwarded message ----------
From: Jason Lixfeld <ja...@lixfeld.ca>
Date: 2016-02-27 18:09 GMT+01:00
Subject: Re: [c-nsp] ME3600X 15.2S memory leak
To: Mark Tinka <mark.ti...@seacom.mu>
Cc : cisco-...@puck.nether.net




> On Feb 27, 2016, at 1:21 AM, Mark Tinka <mark.ti...@seacom.mu> wrote:
>
>
>
> On 26/Feb/16 20:42, Jason Lixfeld wrote:
>
>> Upgrade to at least 15.3(3)S2. There are major issues with IPv6 and egress 
>> ACLs that cause this.
>
> Funny you should mention that, Jason.
>
> There is some IPv6 strangeness that I can't quite put my finger on when
> this happens, and we are re-routing across the ring. The last time I saw
> something like with egress IPv6 ACL's was when the ME3600X started
> shipping - 12.2EY days.
>
> At any rate, I was considering upgrading the unit to 15.5(3)S2 and
> monitor it for a couple of days.
>
> The box is running IPv6 and VPNv6. One customer is setup for IPv6, but
> that BGP session is down.
>
> IPv6 ACL's exit only on the core-facing interfaces.
>
> Do you have details on this issue you can share?

I worked with Cisco on it for months - this went past TAC and past the
BU.  I worked directly with the IOS-XE, ME3600 and Nile ASIC hardware
developers to identify the issue, and it took forever!  (Credit to
this team of developers in India - there were relentless and amazing!
Too bad it took months to get to this team)

Here’s the jist of it (but the issue was not about egress IPv6 ACLs,
it was with the combination of IPv6 being enabled and *any* egress ACL
being configured on any interface):

First identified in 15.3(3)S (but first seen in 15.2S) is a
IPv6/Egress ACL resource collision issue caused by shared memory
between the two features causing memory corruption.  This can be seen
by ChCompactChecksumerrorCount incrementing. 'no ipv6 unicast routing'
& reload to fix.  The other option is to set 'platform acl
egress-disable' to disable egress ACLs, but since there were egress
ACLs used on our boxes, we opted to disable IPv6. Reload is required
to implement either fix.

I dug back through my emails, but I can’t actually find the bug ID
that was provided for this issue, but I think CSCul27742 is it.  Cisco
seems to have redirected that BugID to CSCui23725, but you can still
sort of screen scrape it:

---

CSCul27742 Transit Packet Loss and Output Drops due to IPv6 Routing
Symptom: Transit traffic is randomly dropped. When traffic is lost the
number of Output Drops under the "show interface" command is seen
incrementing.

Conditions: me3600 or me36800 with "IPv6 unicast-routing" or "no ipv6
multicast-routing" configured. The traffic dropped does not have to be
IPv6 traffic, and the box does not need to be configured for any other
IPv6 services. This does not impact other platforms running this
software version.

Workaround:...more
Details
Known Affected Releases: (1)
15.3(3)S
Known Fixed Releases: 0
Release Pending
Product: Cisco ME 3600X Series Ethernet Access Switches

---

In 15.3(3)S and earlier, there was no way to disable egress ACLs from
the CLI, so the only way to do it was through sdcli:

service internal
exit (to return to enable mode)
sdcli
nile pp reg configegressouteracl configure 1 0 aclEnable 0
nile pp reg configegressouteracl configure 0 0 aclEnable 0 arsenic
mmap i_write 0x40 0x00c24018 0x32 arsenic mmap i_write 0x40 0x00c2401c
0x30 arsenic mmap i_write 0x45 0x00c24018 0x32 arsenic mmap i_write
0x45 0x00c2401c 0x30 exit

NOTE:  These changes will *not* persist across reload.

'platform acl egress-disable’ was introduced per CSCui23725 which made
it possible to disable egress ACLs from the CLI while running a
version of code that was affected by CSCul27742.

If you are running into odd issues with late 15.2 and 15.3, check here
to see if you are running up against CSCul27742.  If so, disable
egress ACLs or disable IPv6:

sdcli#nile debug stats 0 ChannelCompact
ChCompactReversalAbortCount            0 (0x0)
ChCompactDiscardCount            18362 (0x47BA)
ChCompactChecksumerrorCount            6046 (0x179E)
ChCompactLengthErrorCount            0 (0x0)
ChCompactSequenceErrorCount            0 (0x0)
ChCompactETxFifoFullDiscardCount            0 (0x0)
Ok

sdcli#nile debug stats 0 ChannelCompact
ChCompactReversalAbortCount            0 (0x0)
ChCompactDiscardCount            18381 (0x47CD)
ChCompactChecksumerrorCount            6921 (0x1B09)
ChCompactLengthErrorCount            0 (0x0)
ChCompactSequenceErrorCount            0 (0x0)
ChCompactETxFifoFullDiscardCount            0 (0x0)
Ok

There was supposed to be a feature introduced in later code to
auto-detect between egress ACLs or IPv6, depending on what the
configuration was.  Aside from that, I don’t honestly know if this
issue was ever actually fixed.  For us, once we got to 15.3(3)S2, we
disabled IPv6 and abandoned the platform.

Hope that helps.


_______________________________________________
cisco-nsp mailing list  cisco-...@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


---------------------------
Liste de diffusion du FRnOG
http://www.frnog.org/

Répondre à