A colleague pointed out recently that some of the gotchas and fixes we run into are interesting to others, so in that spirit, I have another one to share with you today.
In this case, a malfunctioning EX4200 (10.4R4.5) appears to have valid ARP entries for a few boxes, but when you try to ping them, etc. the boxes do not get any traffic. In fact, they receive nothing from the switch except ARP who-has. They respond, and upon clearing the ARP entries from the EX4200, that process repeats. Upon investigating the PFE data, I found that the halp-nh arp-table was missing these ARP entries, even though they were present in the Junos CLI and indeed the correct MAC address is referenced in the PFE route table. See below: PFEM0(vty)# show route ip prefix 192.0.2.39 detail IPv4 Route Table 0, default.0, 0x0: Destination NH IP Addr Type NH ID Interface ------------ --------------- -------- ----- --------- 192.0.2.39 192.0.2.39 Unicast 2933 RT-ifl 197 vlan.1122 ifl 197 RT flags: 0x0000, Ignore: 0x00000000, COS index: 0 DCU id: 0, SCU id: 0, RPF ifl list id: 0 PFEM0(vty)# show nh id 2933 detail ID Type Interface Next Hop Addr Protocol Encap MTU Flags PFE internal Flags ----- -------- ------------- --------------- ---------- ------------ ---- ---------- -------------------- 2933 Unicast vlan.1122 192.0.2.39 IPv4 Ethernet 0 0x00000000 0x00000000 Flags: 2 nh_idx: 3 CMD: Route Arp Idx: 1341 MTU Idx: 2 Num Tags: 0 Upd Cnt: 1 Tun Strt: False Chain_nh 3484: Hw install: 1 Mac: 00 0e 0c a2 2d dc PFEM0(vty)# show halp-nh arp-table Device: 0 ...hundreds and hundreds of lines... ArpEntry Idx 1340 : 00:15:17:6b:a9:7c ArpEntry Idx 1342 : 00:25:90:2c:41:e5 ...hundreds more, but where is Idx 1341?! Our "fix" is to remove 192.0.2.1/27 from the vlan.1122 configuration, commit, and then rollback. This is obviously not good. I would like to have tried installing a different ARP entry (by configuring this IP address on another machine) but I have not had an opportunity to test this yet. The reason this is happening is the ASIC vendor format ARP table in the PFE memory is abstracted from the "Juniper ARP table," as I understand. It appears that simply refreshing the Juniper ARP table with an identical entry does not cause a missing entry to be put into the forwarding table. I would love to be able to reproduce this, but with hundreds to a few thousand machines each on many EX4200 stacks, it happens very rarely. I only mention it because "clear arp" from the CLI does not work, so this problem gets escalated until it reaches someone brave enough to temporarily break some unaffected boxes to fix a broken one. It would be nice, though, if "clear arp" actually worked right. If you encounter this problem and do something different, I would be very interested to hear from you! -- Jeff S Wheeler <[email protected]> Sr Network Operator / Innovative Network Concepts _______________________________________________ juniper-nsp mailing list [email protected] https://puck.nether.net/mailman/listinfo/juniper-nsp

