I originally posted this on the IPV6-Ops mailing list, but it now seems to be more of a switching issue than IPV6 protocol related.
Background: Our enterprise backbone network has 2ea 6500s with Sup720XLs which connect to our 3 major ISPs at 10Gbs. We call these the Internet Hubs. They are running SXI5 IOS and are configured for BGP (full table), Internet IPV4 Multicast routing and EIGRP for IGP. They are running both IPV4 & IPV6 in a dual stack mode with no problems for over a year. These two routers connect to our Enterprise Edge routers (also 6500s with Sup720XL-10G). They are running SXJ1 IOS code and house several VRFs, mostly for guest networks. One of the VRFs is used for “outside” traffic. A pair of Cisco ASAs connect the “outside VRF” and the “inside” global routing tables. The ASAs neighbor EIGRP with the router to learn about IPV4 “inside” networks. These routers also do MPLS VPNs to connect to various guest networks on different campuses as well as some other DMZ stuff. We also have several outside partners connecting to these routers. The ‘edge” routers connect to the Enterprise Core routers which route to various campuses over a large DWDM Ethernet MAN/WAN. The Problem: Occurred when we tried to enable IPV6 routing on the edge routers. We have narrowed the scenario down to these conditions: 1) “mls ipv6 vrf “, “ipv6 address-family” added to one or more VRF definitions. 2) The “outside” VRF table holds the full Internet table + EIGRP routes to local “outside” devices/subnets. 3) IPV4 BGP session to a neighbor is open and operational and sharing the “outside” VRF. 4) No other IPV6 configuration has been entered yet. When “ipv6 unicast-routing” is entered the following happens: 1) EIGRP & BGP neighbors drop on interfaces with BFD enabled. (we took it out) 2) Traffic through the router drops to a crawl (0-2000 bps) ICMP doesn’t seem affected, but I’m not pushing that much ICMP. 3) The SP cpu goes to nearly 100% 4) Most of the interface traffic is routed to the RP (confirmed by ERSPAN) 5) Telnet connections to the router don’t drop and EIGRP neighbors stay connected. This slowness isn’t the same as when BGP is 1st enabled and is loading routes – its much worse, traffic throughput almost stops ….!! When we twice tried enabling IPV6 during a change window it brought all Internet connectivity to a halt. I think this is due to the neighbor relationships staying up and the router acting as a “black hole”. We have been able to duplicate the issue in a lab. At first we just duplicated the hardware and configuration and it seemed all was OK, that’s why we made the 2nd attempt with Cisco TAC and our senior engineers on hand. Turns out you need to be pushing data through the router to see the problem. In the lab I have 3 sessions pushing from the “outside” and 3 from the “inside”. One session is doing ICMP pings to a host beyond the router. The 2nd session is doing TFTP GETs (UDP port 69) and the 3rd going HTTP GETs (TCP port 80) using “curl” scripts. In the lab, the “slowness” lasts almost 2 minutes. During which there is no unusual traffic (i.e. BGP scanning or reloads) and no CPU processes rise to any noticeable level. Nothing gets logged. The only thing I noticed is the SP CPU goes to 100% and the RP starts getting flooded with traffic from most interfaces. When we tried it in production it was lasting over 4 minutes, so we pulled the plug and removed the changes. The “problem” happens each time the command is entered OR removed. Also it doesn’ FIB TCAM maximum routes : (BGP routes in table = 408K) ======================= Current :- ------- IPv4 + MPLS - 512k (default) IPv6 + IP Multicast - 256k (default) The line cards in the production routers have 1GB ram and are XL versions. Cisco TAC hasn’t been too helpful on this one. I’m looking for any ideas to determine the problem, cause or how to live with it. I figure we could disable IPV4 routing temporarily, enable IPV6 routing, then restart ipv4 routing or just reload the router with the IPV6 commands preloaded – but that seems like a hack to me and I don’t know if this problem will bite me in the ass later if we don’t better understand why this is happening. Any suggestions appreciated, -Jim _______________________________________________ cisco-nsp mailing list [email protected] https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
