https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280074
Zhenlei Huang <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #9 from Zhenlei Huang <[email protected]> --- (In reply to Joshua Kinard from comment #8) > I don't think the issue is in the epair(4) driver. I can clearly prove that > the > base system em(4) driver exhibits the behavior, while the variant provided by > net/intel-em-kmod does not. That's a sign that it's more likely an issue > with the > base system em(4) driverm not epair(4). I think epair(4) is innocent, but it does participate and is involved. > Nor do I think it is bridge(4) that is at fault, as I have other FreeBSD > systems > using re(4), igb(4), and lagg(4) that also use jails with epair(4), and > starting/stopping their jails does not cause a temporary loss of network > connectivity, like I am seeing with em(4) (and as others in the referenced > Reddit > thread also noticed). Let'me try to figure it out, how that issue arise. The bridge(4), by its design, try to enable the supported capabilities on all of its members as possible as it can [1] [2]. When a new member, say epair0a, is added, the bridge(4) check the supported capabilities, that is an intersection, of all members [3]. Before 14.4 and 15.0, epair(4) does not support TXCSUM, so other members's TXCSUM capability are all disabled. When a member is removed from the bridge(4), say `ifconfig bridge0 deletem epair0a`, the bridge(4) will re-check the supported capabilities [4]. This time `TXCSUM` is possible to be enabled back, if the remaining members all support it. So you see > TX checksum offloading gets turned back on When toggling a capability, the driver may or may not do a full restart of interface ( port ). That really depends. Typically a full restart requires more time. So I would conclude that, this issue is caused by a combination of multiple factors. 1. bridge(4) re-check the supported capabilities of all members, on adding a new member or removing an existing member 2. epair(4) does not support TX checksum, before 14.4 ( currently beta versions ) and 15.0 3. em(4) takes a long time to toggle TX checksum capability, or there's flaw that it drops too much traffic while toggling the capability For the last factor, I think you may test against setup without bridge(4). You ssh into the machine via em(4) interface, disable TX checksum and re-enable, then see whether em(4) hangs for a while. You can also upgrade one of your box to 14.4 ( BETA1 or BETA2 ), or use kernels built from stable/14, to see if the issue persists. Good luck with you ! [1] https://cgit.freebsd.org/src/tree/sys/net/if_bridge.c?h=stable/14#n1054 [2] https://cgit.freebsd.org/src/tree/sys/net/if_bridge.c?h=stable/14#n195 [3] https://cgit.freebsd.org/src/tree/sys/net/if_bridge.c?h=stable/14#n1403 [4] https://cgit.freebsd.org/src/tree/sys/net/if_bridge.c?h=stable/14#n1198 -- You are receiving this mail because: You are the assignee for the bug.
