Hi, I built a setup where vMX(local-as 64512) has following 20 eBGP neighbors over point-to-point connections:
AS64513: vmx1[ge-0/0/6.13] <-> [ge-0.0.6.13]PC AS64514: vmx1[ge-0/0/6.14] <-> [ge-0.0.6.14]PC AS64515: vmx1[ge-0/0/6.15] <-> [ge-0.0.6.15]PC AS64516: vmx1[ge-0/0/6.16] <-> [ge-0.0.6.16]PC AS64517: vmx1[ge-0/0/6.17] <-> [ge-0.0.6.17]PC AS64518: vmx1[ge-0/0/6.18] <-> [ge-0.0.6.18]PC AS64519: vmx1[ge-0/0/6.19] <-> [ge-0.0.6.19]PC AS64520: vmx1[ge-0/0/6.20] <-> [ge-0.0.6.20]PC AS64521: vmx1[ge-0/0/6.21] <-> [ge-0.0.6.21]PC AS64522: vmx1[ge-0/0/6.22] <-> [ge-0.0.6.22]PC AS64523: vmx1[ge-0/0/7.23] <-> [ge-0.0.7.23]PC AS64524: vmx1[ge-0/0/7.24] <-> [ge-0.0.7.24]PC AS64525: vmx1[ge-0/0/7.25] <-> [ge-0.0.7.25]PC AS64526: vmx1[ge-0/0/7.26] <-> [ge-0.0.7.26]PC AS64527: vmx1[ge-0/0/7.27] <-> [ge-0.0.7.27]PC AS64528: vmx1[ge-0/0/7.28] <-> [ge-0.0.7.28]PC AS64529: vmx1[ge-0/0/7.29] <-> [ge-0.0.7.29]PC AS64530: vmx1[ge-0/0/7.30] <-> [ge-0.0.7.30]PC AS64531: vmx1[ge-0/0/7.31] <-> [ge-0.0.7.31]PC AS64532: vmx1[ge-0/0/7.32] <-> [ge-0.0.7.32]PC vMX has an evaluation license installed in order to avoid shallow max prefix limitation. In PC there is a BGP daemon(BIRD) running in a way that first set of 4000 prefixes oscillates between sessions with AS64513 and AS64523, i.e between vmx1[ge-0/0/6.13] <-> [ge-0.0.6.13]PC and vmx1[ge-0/0/7.23] <-> [ge-0.0.7.23]PC connections. Second set of 4000 prefixes oscillates between vmx1[ge-0/0/6.14] <-> [ge-0.0.6.14]PC and vmx1[ge-0/0/7.24] <-> [ge-0.0.7.24]PC connections. Third set of 4000 prefixes oscillates between vmx1[ge-0/0/6.15] <-> [ge-0.0.6.15]PC and vmx1[ge-0/0/7.25] <-> [ge-0.0.7.25]PC connections, etc. CPU usage of rpd fluctuates between 20% and 90% under such conditions. Now when I execute the "ping 10.55.55.1 source 10.55.55.2 count 1000" command, then summary of the results is following: --- 10.55.55.1 ping statistics --- 1000 packets transmitted, 992 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.189/19.572/315.915/36.297 ms I also plotted the results: https://i.imgur.com/wWTgLfX.png As expected, the RTT is very unstable and there is even some packet loss. When I repeat the "ping 10.55.55.1 source 10.55.55.2 count 1000" command without route churn, and thus the CPU usage of rpd is <0.5%, then the summary of the results is much more stable: --- 10.55.55.1 ping statistics --- 1000 packets transmitted, 1000 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.193/2.259/85.518/9.258 ms Plotted results are here: https://i.imgur.com/bFM5zHj.png This might give an impression, that this delay is directly related to rpd process or CPU load caused by rpd, but this does not seem to be true. vMX provides an excellent way to prove this as traffic between vCP and vFP goes through the Linux bridge in the host machine and can be captured. When I execute the "ping 10.55.55.1 source 10.55.55.2" command and measure the delay between the ICMP "echo request" packages on forwarding plane Ethernet interface facing the RE and ICMP "echo request" packages on server interface with 10.55.55.1 address configured, then first packet capture does not detect the occasional delay where the second packet capture in the server does. For example, see the following five ping results from the Junos ping utility: 64 bytes from 10.55.55.1: icmp_seq=1 ttl=64 time=0.459 ms 64 bytes from 10.55.55.1: icmp_seq=2 ttl=64 time=0.434 ms 64 bytes from 10.55.55.1: icmp_seq=3 ttl=64 time=51.631 ms 64 bytes from 10.55.55.1: icmp_seq=4 ttl=64 time=0.622 ms 64 bytes from 10.55.55.1: icmp_seq=5 ttl=64 time=0.294 ms Delays measured on the forwarding-plane Ethernet port facing the RE are stable: 00:00:01.008455 52:54:00:1c:7d:e6 > 52:54:00:17:a4:ae, ethertype IPv4 (0x0800), length 158: 128.0.0.1 > 128.0.0.16: ip-proto-84 124 00:00:01.001053 52:54:00:1c:7d:e6 > 52:54:00:17:a4:ae, ethertype IPv4 (0x0800), length 158: 128.0.0.1 > 128.0.0.16: ip-proto-84 124 00:00:01.006539 52:54:00:1c:7d:e6 > 52:54:00:17:a4:ae, ethertype IPv4 (0x0800), length 158: 128.0.0.1 > 128.0.0.16: ip-proto-84 124 00:00:01.000228 52:54:00:1c:7d:e6 > 52:54:00:17:a4:ae, ethertype IPv4 (0x0800), length 158: 128.0.0.1 > 128.0.0.16: ip-proto-84 124 00:00:01.004250 52:54:00:1c:7d:e6 > 52:54:00:17:a4:ae, ethertype IPv4 (0x0800), length 158: 128.0.0.1 > 128.0.0.16: ip-proto-84 124 I captured those with "tcpdump -ttt -nei vfp-int-vmx1 'ip proto 84 && src host 128.0.0.1 && ip[80:2] = 0x0800'" command. IP protocol 84 used in the packet filter is Juniper proprietary Trivial Tunneling Protocol (TTP) used to encapsulate external control traffic between the RE and line-card microkernel. IPv4 address 128.0.0.1 is configured to em1.0 in RE. "ip[80:2] = 0x0800" matches ICMP "echo request" message ICMP header type and code bytes inside the TTP packet, i.e type needs to be 0x08(Echo Request) and code needs to be 0x00. However, same five packets captured on the ge-0.0.9-vmx1(this one has the 10.55.55.1 configured) interface can be seen below: 00:00:01.008451 02:06:0a:0e:ff:f9 > fe:06:0a:0e:ff:f9, ethertype IPv4 (0x0800), length 98: 10.55.55.2 > 10.55.55.1: ICMP echo request, id 55340, seq 1, length 64 00:00:01.001077 02:06:0a:0e:ff:f9 > fe:06:0a:0e:ff:f9, ethertype IPv4 (0x0800), length 98: 10.55.55.2 > 10.55.55.1: ICMP echo request, id 55340, seq 2, length 64 00:00:01.017782 02:06:0a:0e:ff:f9 > fe:06:0a:0e:ff:f9, ethertype IPv4 (0x0800), length 98: 10.55.55.2 > 10.55.55.1: ICMP echo request, id 55340, seq 3, length 64 00:00:00.989049 02:06:0a:0e:ff:f9 > fe:06:0a:0e:ff:f9, ethertype IPv4 (0x0800), length 98: 10.55.55.2 > 10.55.55.1: ICMP echo request, id 55340, seq 4, length 64 00:00:01.004109 02:06:0a:0e:ff:f9 > fe:06:0a:0e:ff:f9, ethertype IPv4 (0x0800), length 98: 10.55.55.2 > 10.55.55.1: ICMP echo request, id 55340, seq 5, length 64 Third packet arrives later than normal and thus fourth packet arrives earlier than normal, i.e exactly the same behavior demonstrated in my initial e-mail. Packet capture was done with "tcpdump -ttt -nei ge-0.0.9-vmx1 'icmp[icmptype] == 8'" command. In short, it seems to be certain that ICMP "echo request" messages arrive to line-card microkernel without the delay. As already said by Saku, one logical explanation for this delay is the line-card microkernel. As written by David Roy, the microkernel processes TTP packets in software("TTP Transmit" process seen in the output of "sh threads" on PFE CLI) and if the microkernel is for example busy with FIB updates, then for example, the decapsulating of TTP traffic and then forwarding the payload to MQ chip(or similar ASIC in PFE) could indeed be affected. On the other hand, the overall LC CPU usage according to "sh linux cpu usage" did not exceed 80% even during the route churn, but I actually do not know what exactly this utilization means.. Please let me know if anything was unclear or if someone has other ideas or theories. Martin _______________________________________________ juniper-nsp mailing list [email protected] https://puck.nether.net/mailman/listinfo/juniper-nsp

