MICE (IXP in Minneapolis, MN, USA) has had a couple reports of participants seeing BFD issues.

For the most recent example, the participant's router is sending BFD to the route server (confirmed via a packet capture on the route server), but BIRD simply did not respond. Both their IPv4 and IPv6 sessions to rs1 (the first route server) were acting this way, while both of their sessions to rs2 were fine. I compared the route servers and found two other inconsistencies: one network with IPv4 and one network with IPv6.

Here's what the packet capture shows:

   BFD Control message
      001. .... = Protocol Version: 1
      ...0 0011 = Diagnostic Code: Neighbor Signaled Session Down (0x03)
      01.. .... = Session State: Down (0x1)
      Message Flags: 0x48, Control Plane Independent: Set
        0... .. = Poll: Not set
        .0.. .. = Final: Not set
        ..1. .. = Control Plane Independent: Set
        ...0 .. = Authentication Present: Not set
        .... 0. = Demand: Not set
        .... .0 = Multipoint: Not set
      Detect Time Multiplier: 3 (= 6000 ms Detection time)
      Message Length: 24 bytes
      My Discriminator: 0x0000024d
      Your Discriminator: 0x0c764401
      Desired Min TX Interval: 2000 ms (2000000 us)
      Required Min RX Interval: 2000 ms (2000000 us)
      Required Min Echo Interval:    0 ms (0 us)


This may have started when rs1 was rebooted for patching, or it might have been due to a fiber cut that disconnected this participant from the fabric. We are not sure exactly on the timing.

The participant reset their BGP sessions. This caused the BFD session to come up.



We are running BIRD 2.0.8 from Ubuntu's repository on Ubuntu 22.04. Note that this BIRD setup is using the IXP Manager templates to generate the config, so there are separate BIRD processes for IPv4 vs IPv6.

A different participant noted they had seen BFD issues in the past. They noted this happened in March. That would have been the prior route servers, which were BIRD 1.x (I believe) on FreeBSD.

Note that BFD is configured as passive (i.e. the participant has to initiate it). Relevant config bits:

   protocol bfd
   {
            accept ipv4 direct;
            interface "en*" {
                    passive on;
                    multiplier 3;
                    min rx interval 500ms;
                    min tx interval 500ms;
            };
   }

   ...

   protocol bgp pb_0138_as18451 from tb_rsclient {
            description "AS18451 - LES.NET";
            neighbor 206.108.255.175 as 18451;
            ipv4 {
                import limit 120 action restart;
                import filter f_import_as18451;
                table t_0138_as18451;
                export filter f_export_as18451;
            };
            bfd on;
   }


Are there interesting fixes since 2.0.8? I looked at the git commit log and I see this (but it doesn't seem like it would apply):

   commit 99872676df45f1a490d3d63f43081afb41477040
   Author: Ondrej Zajicek <[email protected]>
   Date:   Sun Jan 22 23:42:08 2023 +0100

        BFD: Improve incoming packet matching

        For active sessions, ignore received packets with zero local id and
        mismatched remote id. That forces a session timeout instead of an
        immediate session restart. It makes BFD sessions more resilient to
        packet spoofing.

        Thanks to André Grüneberg for the suggestion.


The discussion was here:
https://lists.iphouse.net/cgi-bin/wa?A2=ind2306&L=MICE-DISCUSS&T=0&F=&S=&P=17401 <https://lists.iphouse.net/cgi-bin/wa?A2=ind2306&L=MICE-DISCUSS&T=0&F=&S=&P=17401>

--
Richard

Reply via email to