Package: bridge-utils Version: 1.5-11 Severity: serious TL;DR: If you're using bridges, bonds and VLANs together, set BRIDGE_HOTPLUG=no in /etc/default/bridge-utils.
Dear Maintainer, There are some rather serious race conditions arising from the fact that bridge-utils handles udev events triggered by ifupdown actions and messes with the state of various interfaces while ifupdown is still running. To illustrate why this is happening, take the following e/n/i configuration as an example: auto bond0 iface bond0 inet manual bond-slaves eth0 eth1 bond-mode active-backup bond-miimon 100 up ip link set $IFACE mtu 9000 auto dmz iface dmz inet manual bridge_ports bond0.200 bridge_fd 0 bridge_stp off bridge_maxwait 0 up ip link set $IFACE up This straightforward configuration worked fine in Jessie, but produces unexpected results on boot since Stretch, which - among others - include: - not setting the bond mode to active-backup, but to round-robin - creating bond0.200 with MTU 1500 instead of 9000 We have been hit by the above issues on production systems dist-upgraded to Stretch, and it all comes down to the races introduced by the bridge-utils hotplug support (which is now enabled by default). So, what is actually happening is the following: 1. On boot, networking.service calls `ifup --allow=auto -a`. This starts off by creating bond0. As soon as the ifenslave hooks create the interface, a udev "add" event for bond0 is triggered, *while ifup is still configuring bond0*. 2. /lib/udev/bridge-network-interface is called, with $INTERFACE set to bond0. The script will run `ifquery --list --allow auto` and will look for any interface containing bond0 or bond0.* in its bridge_ports, matching "dmz" in our case. It will then go on to: a) create_vlan_port: this will run `ip link set bond0 up` and then create the vlan sub-interface on bond0 b) call `ifup dmz` once the vlan port has been created All of the above - for all we know - happen while `ifup -a` is *still* configuring bond0 on its own. Step 2 is especially troublesome for the following reasons: i) create_vlan_port messes with the interface state while ifup is still configuring it. Bonding interfaces - for instance - need to be down to have their mode configured, and create_vlan_port explicitly sets the bond interface up. This causes the bond interface to potentially come up with the default mode (round-robin), making the system unreachable in case e.g. 802.3ad was requested. ii) create_vlan_port creates the VLAN sub-interface while the underlying device is still being configured. This means that the VLAN interface may be inherit the wrong MTU value, if ifup has not yet set the parent interface's MTU to the desired value at the time the VLAN interface is created. iii) dmz is brought up whenever bond0 is brought up, although this has not been necessarily requested. iv) dmz is configured twice (once because of `ifup -a` and once because of bridge-utils setting it up). Note that high-cpu-count SMP systems seem more prone to the races i) and ii). To be completely honest, I don't know what the hotplugging code is trying to achieve here, especially when it comes to short-circuiting ifupdown's internals. At a bare minimum, it should neither bring up "auto" interfaces that happen to be down, nor touch any interface while ifup might be still configuring it. Regards, Apollon -- System Information: Debian Release: buster/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'testing-debug'), (500, 'testing'), (500, 'stable'), (90, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.14.0-3-amd64 (SMP w/4 CPU cores) Locale: LANG=el_GR.UTF-8, LC_CTYPE=el_GR.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages bridge-utils depends on: ii libc6 2.25-5 bridge-utils recommends no packages. Versions of packages bridge-utils suggests: ii ifupdown 0.8.29 -- no debconf information