Excellent post Jeff, thanks for sharing. On Sun, Sep 25, 2011 at 8:36 PM, Jeff Wheeler <[email protected]> wrote:
> A colleague pointed out recently that some of the gotchas and fixes we > run into are interesting to others, so in that spirit, I have another > one to share with you today. > > In this case, a malfunctioning EX4200 (10.4R4.5) appears to have valid > ARP entries for a few boxes, but when you try to ping them, etc. the > boxes do not get any traffic. In fact, they receive nothing from the > switch except ARP who-has. They respond, and upon clearing the ARP > entries from the EX4200, that process repeats. > > Upon investigating the PFE data, I found that the halp-nh arp-table > was missing these ARP entries, even though they were present in the > Junos CLI and indeed the correct MAC address is referenced in the PFE > route table. See below: > > PFEM0(vty)# show route ip prefix 192.0.2.39 detail > > IPv4 Route Table 0, default.0, 0x0: > Destination NH IP Addr Type NH ID Interface > ------------ --------------- -------- ----- --------- > 192.0.2.39 192.0.2.39 Unicast 2933 RT-ifl > 197 vlan.1122 ifl 197 > > RT flags: 0x0000, Ignore: 0x00000000, COS index: 0 > DCU id: 0, SCU id: 0, RPF ifl list id: 0 > > > > PFEM0(vty)# show nh id 2933 detail > ID Type Interface Next Hop Addr Protocol > Encap MTU Flags PFE internal Flags > ----- -------- ------------- --------------- ---------- > ------------ ---- ---------- -------------------- > 2933 Unicast vlan.1122 192.0.2.39 IPv4 > Ethernet 0 0x00000000 0x00000000 > > Flags: 2 nh_idx: 3 > CMD: Route Arp Idx: 1341 > MTU Idx: 2 Num Tags: 0 > Upd Cnt: 1 Tun Strt: False > Chain_nh 3484: > Hw install: 1 > Mac: 00 0e 0c a2 2d dc > > > > PFEM0(vty)# show halp-nh arp-table > Device: 0 > ...hundreds and hundreds of lines... > ArpEntry Idx 1340 : 00:15:17:6b:a9:7c > ArpEntry Idx 1342 : 00:25:90:2c:41:e5 > ...hundreds more, but where is Idx 1341?! > > > Our "fix" is to remove 192.0.2.1/27 from the vlan.1122 configuration, > commit, and then rollback. This is obviously not good. I would like > to have tried installing a different ARP entry (by configuring this IP > address on another machine) but I have not had an opportunity to test > this yet. > > The reason this is happening is the ASIC vendor format ARP table in > the PFE memory is abstracted from the "Juniper ARP table," as I > understand. It appears that simply refreshing the Juniper ARP table > with an identical entry does not cause a missing entry to be put into > the forwarding table. > > I would love to be able to reproduce this, but with hundreds to a few > thousand machines each on many EX4200 stacks, it happens very rarely. > I only mention it because "clear arp" from the CLI does not work, so > this problem gets escalated until it reaches someone brave enough to > temporarily break some unaffected boxes to fix a broken one. It would > be nice, though, if "clear arp" actually worked right. > > If you encounter this problem and do something different, I would be > very interested to hear from you! > -- > Jeff S Wheeler <[email protected]> > Sr Network Operator / Innovative Network Concepts > > _______________________________________________ > juniper-nsp mailing list [email protected] > https://puck.nether.net/mailman/listinfo/juniper-nsp > _______________________________________________ juniper-nsp mailing list [email protected] https://puck.nether.net/mailman/listinfo/juniper-nsp

