Hi, I have two OpenBSD 6.9 servers: fw-1 (10.0.0.58) and fw-2 (10.0.0.59) In last few days I got reports from our monitoring saying there is packet loss to them. So I tried to ping from fw-1 to fw-2:
fw-1$ ping -c 10 fw-2 PING fw-2 (10.0.0.59): 56 data bytes 64 bytes from 10.0.0.59: icmp_seq=0 ttl=255 time=0.533 ms 64 bytes from 10.0.0.59: icmp_seq=1 ttl=255 time=0.735 ms 64 bytes from 10.0.0.59: icmp_seq=2 ttl=255 time=0.517 ms 64 bytes from 10.0.0.59: icmp_seq=3 ttl=255 time=0.506 ms 64 bytes from 10.0.0.59: icmp_seq=4 ttl=255 time=0.609 ms 64 bytes from 10.0.0.59: icmp_seq=6 ttl=255 time=0.503 ms 64 bytes from 10.0.0.59: icmp_seq=7 ttl=255 time=0.479 ms 64 bytes from 10.0.0.59: icmp_seq=8 ttl=255 time=0.523 ms 64 bytes from 10.0.0.59: icmp_seq=9 ttl=255 time=0.507 ms --- fw-2.snet.verza.net ping statistics --- 10 packets transmitted, 9 packets received, 10.0% packet loss round-trip min/avg/max/std-dev = 0.479/0.546/0.735/0.075 ms and tcpdump on fw-2 says it saw the icmp_seq=5 request but did not reply: fw-2$ doas tcpdump -lnp -i trunk0 icmp and host 10.0.0.58 tcpdump: listening on trunk0, link-type EN10MB 11:56:13.087075 10.0.0.58 > 10.0.0.59: icmp: echo request 11:56:13.087094 10.0.0.59 > 10.0.0.58: icmp: echo reply 11:56:14.092993 10.0.0.58 > 10.0.0.59: icmp: echo request 11:56:14.093005 10.0.0.59 > 10.0.0.58: icmp: echo reply 11:56:15.092840 10.0.0.58 > 10.0.0.59: icmp: echo request 11:56:15.092851 10.0.0.59 > 10.0.0.58: icmp: echo reply 11:56:16.092828 10.0.0.58 > 10.0.0.59: icmp: echo request 11:56:16.092839 10.0.0.59 > 10.0.0.58: icmp: echo reply 11:56:17.092809 10.0.0.58 > 10.0.0.59: icmp: echo request 11:56:17.092822 10.0.0.59 > 10.0.0.58: icmp: echo reply 11:56:18.092793 10.0.0.58 > 10.0.0.59: icmp: echo request 11:56:19.092776 10.0.0.58 > 10.0.0.59: icmp: echo request 11:56:19.092786 10.0.0.59 > 10.0.0.58: icmp: echo reply 11:56:20.092726 10.0.0.58 > 10.0.0.59: icmp: echo request 11:56:20.092744 10.0.0.59 > 10.0.0.58: icmp: echo reply 11:56:21.092756 10.0.0.58 > 10.0.0.59: icmp: echo request 11:56:21.092774 10.0.0.59 > 10.0.0.58: icmp: echo reply 11:56:22.092733 10.0.0.58 > 10.0.0.59: icmp: echo request 11:56:22.092743 10.0.0.59 > 10.0.0.58: icmp: echo reply I can see the echo reply ICMP packet is missing from netstat stats as well: fw-2$ netstat -ss -p icmp icmp: 101 calls to icmp_error Output packet histogram: echo reply: 40626 destination unreachable: 101 time stamp reply: 1 Input packet histogram: echo reply: 247 destination unreachable: 1 echo: 40626 time stamp: 1 address mask request: 3 #37: 1 40627 message responses generated .. 10 ICMP requests .. fw-2$ netstat -ss -p icmp icmp: 101 calls to icmp_error Output packet histogram: echo reply: 40635 destination unreachable: 101 time stamp reply: 1 Input packet histogram: echo reply: 247 destination unreachable: 1 echo: 40635 time stamp: 1 address mask request: 3 #37: 1 40636 message responses generated I've tried to disable pf but it did not have any impact. Device trunk0 has two bnxt type interfaces. Both servers are in place for years and both of them started to lose packets in last few days. How can I debug such problem please? Disclaimer: ip addresses might have been changed to prevent information leak as we are in audited environment. Thanks, Pavel Mateja