Hi, everyone, one of the systems on which we run our jail based "proServer" product failed in a very odd way for the second time with a couple of days between the two incidents.
We run VIMAGE based jails (a lot) and bridge them with the physical interface of the machine. --------- cloned_interfaces="bridge0 bridge1" ifconfig_bridge0_name="inet0" ifconfig_inet0="addm ix0 up" ifconfig_inet0_alias0="inet 217.29.41.2/24" ifconfig_inet0_ipv6="inet6 2a00:b580:8000:11:44e8:ab80:816:7869/64 auto_linklocal" ifconfig_bridge1_name="mgmt0" ifconfig_mgmt0="addm ix1 up" ifconfig_mgmt0_alias0="inet 10.5.105.7/16" ifconfig_mgmt0_ipv6="inet6 auto_linklocal" --------- The rest is managed by iocage wich creates the needed epair(4) interfaces, for some reason renames them to "vnetX" and adds them as members to the bridge. E.g. --------- [ry93@ph002 ~]$ ifconfig inet0 inet0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether 02:50:51:fe:cc:00 inet6 fe80::50:51ff:fefe:cc00%inet0 prefixlen 64 scopeid 0x4 inet6 2a00:b580:8000:11:44e8:ab80:816:7869 prefixlen 64 inet 217.29.41.2 netmask 0xffffff00 broadcast 217.29.41.255 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> groups: bridge id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: vnet0:69 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 76 priority 128 path cost 2000 [... 50 vnet interfaces following ...] member: ix0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 1 priority 128 path cost 2000 --------- When the system fails - no jail is reachable from the outside via IP - no jail is reachable from the host via IP - the host itself is reachable just fine - when we `iocage console` into a jail it can reach it's own IP addresses but nothing "outside" I tried - ifconfig ix0 down; ifconfig ix0 up - ifconfig inet0 down; ifconfig inet0 up # aka bridge0 - iocage stop <jail>; iocage start <jail> The latter deletes the epair instance connected to the jail and creates a fresh one, then adds it to the bridge. No change in connectivity ... the start of the jail takes "forever" because various processes hang waiting DNS timeouts (no networking ;-) There's nothing in /var/log/messages or the dmesg buffer that relates to networking! Rebooting the host system "fixes" the situation. Now I'm well aware that this is too little information to draw some definite conclusions. Hence my first question is: what should I do (commands) when the situation arises again to gather more evidence? Or maybe we are just lucky and there is a known problem? Yes, I know VIMAGE is still considered experimental. We have been running this in production for months and it looks like it could be related to upgrading host and jails from 10.3 to 11.0 *or* switching the old shell based iocage for Brandon's new python based version. I cannot rule out iocage, yet it's not very probable - this is not a Docker like running service or network component, after all. Once the jails are up, iocage is done ... An then there's the chance that it is something with the ix driver and the way we use the interface ... so for completeness: --------- ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0x6020-0x603f mem 0xc7c00000-0xc7dfffff,0xc7e04000-0xc7e07fff irq 26 at device 0.0 numa-domain 0 on pci3 ix0: Using MSIX interrupts with 9 vectors ix0: Ethernet address: 0c:c4:7a:34:ec:ba ix0: PCI Express Bus: Speed 5.0GT/s Width x8 ix0: netmap queues/slots: TX 8/2048, RX 8/2048 ix0: promiscuous mode enabled ix0: link state changed to UP --------- As usual thanks for any hints, Patrick
signature.asc
Description: Message signed with OpenPGP