Re: rtnl_mutex deadlock?

Daniel Borkmann Thu, 06 Aug 2015 07:52:01 -0700

On 08/06/2015 02:30 AM, Herbert Xu wrote:

On Wed, Aug 05, 2015 at 08:59:07PM +0200, Daniel Borkmann wrote:


Here's a theory and patch below. Herbert, Thomas, does this make any
sense to you resp. sound plausible? ;)


It's certainly possible.  Whether it's plausible I'm not so sure.
The netlink hashtable is unlimited in size.  So it should always
be expanding, not rehashing.  The bug you found should only affect
rehashing.

I'm not quite sure what's best to return from here, i.e. whether we
propagate -ENOMEM or instead retry over and over again hoping that the
rehashing completed (and no new rehashing started in the mean time) ...


Please use something other than ENOMEM as it is already heavily
used in this context.  Perhaps EOVERFLOW?


Okay, I'll do that.

We should probably add a WARN_ON_ONCE in rhashtable_insert_rehash
since two concurrent rehashings indicates something is going
seriously wrong.


So, if I didn't miss anything, it looks like the following could have
happened: the worker thread, that is rht_deferred_worker(), itself could
trigger the first rehashing, e.g. after shrinking or expanding (or also
in case none of both happen).

Then, in __rhashtable_insert_fast(), I could trigger an -EBUSY when I'm
really unlucky and exceed the ht->elasticity limit of 16. I would then
end up in rhashtable_insert_rehash() to find out there's already one
ongoing and thus, I'm getting -EBUSY via __netlink_insert().

Perhaps that is what could have happened? Seems rare though, but it was
also only seen rarely so far ...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rtnl_mutex deadlock?

Reply via email to