While digging into this - I found the following commit which might contain a
fix:
https://github.com/torvalds/linux/commit/ebaf39e6032faf77218220707fc3fa22487784e0
** Changed in: linux (Ubuntu Bionic)
Status: Expired => Confirmed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1765980
Title:
IPv6 fragments with nf_conntrack_reasm loaded cause net_mutex deadlock
upon LXD container shutdown
Status in linux package in Ubuntu:
Expired
Status in linux source package in Bionic:
Confirmed
Bug description:
I've spent the last few days tracking down an issue where an attempt
to shutdown an LXD container after several hours of host uptime on
Ubuntu Bionic (4.15.0-15.16-generic) would cause a kworker thread to
start spinning on one CPU core and all subsequent container start/stop
operations to fail.
The underlying issue is that a kworker thread (executing cleanup_net)
spins in inet_frags_exit_net, waiting for sum_frag_mem_limit(nf) to
become zero, which never happens becacuse it has underflowed to some
negative multiple of 64. That kworker thread keeps holding net_mutex
and therefore blocks any further container start/stops. That in turn
is triggered by receiving a fragmented IPv6 MDNS packet in my
instance, but it could probably be triggered by any fragmented IPv6
traffic.
The reason for the frag mem limit counter to underflow is
nf_ct_frag6_reasm deducting more from it than the sum of all previous
nf_ct_frag6_queue calls added, due to pskb_expand_head (called through
skb_unclone) adding a multiple of 64 to the SKB's truesize, due to
kmalloc_reserve allocating some additional slack space to the buffer.
Removing this line:
size = SKB_WITH_OVERHEAD(ksize(data));
or making it conditional with nhead or ntail being nonzero works around the
issue, but a proper fix for this seems complicated.
There is already a comment saying "It is not generally safe to change
skb->truesize." right above the offending modification of truesize, but the if
statement guarding it apparently doesn't keep out all problematic cases.
I'll leave figuring out the proper way to fix this to the maintainers of this
area... ;)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765980/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp