https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213410

            Bug ID: 213410
           Summary: [carp] service netif restart causes hang only when
                    carp is enabled
           Product: Base System
           Version: 11.0-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bin
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: d...@skunkwerks.at

Created attachment 175654
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=175654&action=edit
dmesg

# steps

FreeBSD 11.0Rp1 amd64

- dmesg attached
- ifconfig (IPs masked)

igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu
1500
       
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 78:45:c4:fa:d2:12
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu
1500
       
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 78:45:c4:fa:d2:12
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu
1500
       
options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 78:45:c4:fa:d2:12
        inet 10.0.9.83 netmask 0xfffffff0 broadcast 10.0.9.95
        inet 10.0.9.84 netmask 0xffffffff broadcast 10.0.9.84 vhid 1
        inet 10.0.9.85 netmask 0xffffffff broadcast 10.0.9.85 vhid 3
        inet6 fe80::7a45:c4ff:fefa:d212%lagg0 prefixlen 64 scopeid 0x4
        inet6 3000:3050:3000:4::83 prefixlen 64
        inet6 3000:3050:3000:4::84 prefixlen 64 vhid 2
        inet6 3000:3050:3000:4::85 prefixlen 64 vhid 4
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        carp: BACKUP vhid 1 advbase 1 advskew 100
        carp: BACKUP vhid 3 advbase 1 advskew 0
        carp: BACKUP vhid 2 advbase 1 advskew 100
        carp: BACKUP vhid 4 advbase 1 advskew 0
        groups: lagg
        laggproto lacp lagghash l2,l3,l4
        laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

issue `service netif restart`

This was initially done via net/mosh connection and tmux inside that, 
but repeated again with direct console access (KVM remote mgmt tool).

## actual results

the system hangs, 100% reproducible.

- no keyboard entry
- no ability to Alt-F3 to switch tabs
- no ping over network
- a hard reboot is required to regain control
- final message in log appears to be 
    Oct 12 08:01:22 bridget kernel: lagg0: link state changed to DOWN


### console

Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit:
netif_stop
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lo0: 48

### /var/log/messages
Oct 12 08:00:00 bridget newsyslog[1525]: logfile turned over due to size>100K
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit:
netif_stop
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lo0: 48
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_gateway_enable is set to NO.
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget kernel: carp: 2@lagg0: BACKUP -> INIT (hardware
interface up)
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget kernel: carp: 4@lagg0: MASTER -> INIT (hardware
interface up)
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget last message repeated 3 times
Oct 12 08:01:21 bridget kernel: carp: 1@lagg0: BACKUP -> INIT (hardware
interface up)
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget last message repeated 2 times
Oct 12 08:01:21 bridget kernel: carp: 3@lagg0: MASTER -> INIT (hardware
interface up)
Oct 12 08:01:21 bridget kernel: igb0: promiscuous mode disabled
Oct 12 08:01:21 bridget kernel: igb1: promiscuous mode disabled
Oct 12 08:01:21 bridget kernel: lagg0: promiscuous mode disabled
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: The following interfaces
were not configured:
Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed
for interface lagg0: 3
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Destroyed wlan(4)s:
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
cloned_interfaces_sticky is set to NO.
Oct 12 08:01:21 bridget kernel: lagg0: link state changed to DOWN
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Destroyed clones: lagg0
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable
is set to YES.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit:
netif_start
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Created wlan(4)s:
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Cloned: lagg0
Oct 12 08:01:21 bridget root: /etc/pccard_ether: DEBUG: run_rc_command:
start_precmd: checkauto
Oct 12 08:01:21 bridget root: /etc/pccard_ether: DEBUG: run_rc_command: doit:
pccard_ether_start
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable
is set to YES.
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: run_rc_command: doit:
netif_start lagg0
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: Created wlan(4)s:
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: Cloned:
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:21 bridget kernel: lagg0: link state changed to UP
Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_gateway_enable is set to NO.
Oct 12 08:01:21 bridget kernel: igb0: promiscuous mode enabled
Oct 12 08:01:21 bridget kernel: igb1: promiscuous mode enabled
Oct 12 08:01:21 bridget kernel: lagg0: promiscuous mode enabled
Oct 12 08:01:21 bridget kernel: igb0: link state changed to DOWN
Oct 12 08:01:21 bridget kernel: carp: 1@lagg0: INIT -> BACKUP (initialization
complete)
Oct 12 08:01:21 bridget kernel: carp: 3@lagg0: INIT -> BACKUP (initialization
complete)
Oct 12 08:01:21 bridget kernel: carp: 2@lagg0: INIT -> BACKUP (initialization
complete)
Oct 12 08:01:21 bridget kernel: carp: 4@lagg0: INIT -> BACKUP (initialization
complete)
Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:22 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno:
ipv6_activate_all_interfaces is set to NO.
Oct 12 08:01:22 bridget kernel: igb1: link state changed to DOWN
Oct 12 08:01:22 bridget kernel: carp: 1@lagg0: BACKUP -> INIT (hardware
interface down)
Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 240 (interface down)
Oct 12 08:01:22 bridget kernel: carp: 3@lagg0: BACKUP -> INIT (hardware
interface down)
Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 480 (interface down)
Oct 12 08:01:22 bridget kernel: carp: 2@lagg0: BACKUP -> INIT (hardware
interface down)
Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 720 (interface down)
Oct 12 08:01:22 bridget kernel: carp: 4@lagg0: BACKUP -> INIT (hardware
interface down)
Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 960 (interface down)
Oct 12 08:01:22 bridget kernel: lagg0: link state changed to DOWN
Oct 12 08:01:24 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: rc_startmsgs
is set to YES.

# expected results

after a short period of downtime, the network is re-established.

# notes

if carp config is disabled, and system is rebooted, this functions as expected.

# config

```
# /etc/rc.conf on 1st node
hostname="one.my.domain"
ifconfig_igb0="up"
ifconfig_igb1="up"
cloned_interfaces="lagg0"
ifconfig_lagg0="inet 10.0.9.82 netmask 255.255.255.240 laggproto lacp laggport
igb0 laggport igb1"
ifconfig_lagg0_ipv6="inet6 3000:3050:3000:4::82/64"
# ifconfig_lo1="inet 10.0.0.254 netmask 255.255.255.0"
defaultrouter="10.0.9.81"
ipv6_defaultrouter="3000:3050:3000:4::1"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
zfs_enable="YES"

# carp on
kld_list="carp"
ifconfig_lagg0_aliases="\
        inet  vhid 1 advskew   0 pass pwd1 10.0.9.84/32 \
        inet6 vhid 2 advskew   0 pass pwd2 3000:3050:3000:4::84/64 \
        inet  vhid 3 advskew 100 pass pwd3 10.0.9.85/32 \
        inet6 vhid 4 advskew 100 pass pwd4 3000:3050:3000:4::85/64"

# debugging rc.d scripts
rc_debug="YES"
rc_startmsgs="YES"
```

```
# /etc/rc.conf on 2nd node
hostname="two.my.domain"
ifconfig_igb0="up"
ifconfig_igb1="up"
cloned_interfaces="lagg0"
ifconfig_lagg0="inet 10.0.9.83 netmask 255.255.255.240 laggproto lacp laggport
igb0 laggport igb1"
ifconfig_lagg0_ipv6="inet6 3000:3050:3000:4::83/64"
defaultrouter="10.0.9.81"
ipv6_defaultrouter="3000:3050:3000:4::1"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
zfs_enable="YES"

# carp on
kld_list="carp"
ifconfig_lagg0_aliases="\
        inet  vhid 1 advskew 100 pass pwd1 10.0.9.84/32 \
        inet6 vhid 2 advskew 100 pass pwd2 3000:3050:3000:4::84/64 \
        inet  vhid 3 advskew   0 pass pwd3 10.0.9.85/32 \
        inet6 vhid 4 advskew   0 pass pwd4 3000:3050:3000:4::85/64"

# debugging rc.d scripts
rc_debug="YES"
rc_startmsgs="YES"
```

```
# /boot/loader.conf
/boot/loader.conf
# storage
# zfs won't start mounting volumes without this
zfs_load="YES"
kern.geom.label.gptid.enable="0"

# hardware
coretemp_load="YES"

# console
# ensure console in IPMI mode remains accessible instead of going all white
hw.vga.textmode=1

# bhyve and jails
vmm_load="YES"
nmdm_load="YES"
if_bridge_load="YES"
if_tap_load="YES"
kern.racct.enable=1

# debug super powers
dtraceall_load="YES"

# runtime
# maxfiles
kern.maxfiles="25000"

# network
# fibs
# https://blog.feld.me/posts/2015/06/routing-a-freebsd-jail-through-openvpn/
# https://www.freebsd.org/cgi/man.cgi?query=setfib
net.fibs=2
# from https://calomel.org/freebsd_network_tuning.html
accf_data_load="YES"
accf_dns_load="YES"
autoboot_delay="3"
ahci_load="YES"
aio_load="YES"
cc_htcp_load="YES"
net.tcp.hostcache.cachelimit="0"
```


```
# /etc/sysctl.conf
# carp tweaks
net.inet.carp.preempt=1
```

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

Reply via email to