On 07.03.2018 12:22, Kirill Tkhai wrote:
> On 06.03.2018 19:50, Eric Dumazet wrote:
>> On Tue, 2018-03-06 at 19:24 +0300, Kirill Tkhai wrote:
>>> After unshare test kworker hangs for ages:
>>> $ while :; do unshare -n true; done &
>>> $ perf report <kworker>
>>> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k]
>>> - ops_exit_list.isra.9
>>> - 85,68% igmp6_net_exit
>>> - 53,31% sock_release
>>> - inet_release
>>> - 25,33% rawv6_close
>>> - ip6mr_sk_done
>>> + 23,38% synchronize_rcu
>>> Keep in mind, this perf report shows the time a function was
>>> executing, and
>>> it does not show the time, it was sleeping. But it's easy to imagine,
>>> much it is sleeping, if synchronize_rcu() execution takes the most
>>> Top shows the kworker R time is less then 1%.
>>> This happen, because of in ip6mr_sk_done() we do too many
>>> even for the sockets, that are not referenced in mr_table, and which
>>> are not
>>> need it. So, the progress of kworker becomes very slow.
>>> The patch introduces apparent solution, and it makes ip6mr_sk_done()
>>> to skip
>>> synchronize_rcu() for sockets, that are not need that. After the
>>> kworker becomes able to warm the cpu up as expected.
>>> Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com>
>>> net/ipv6/ip6mr.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
>>> index 2a38f9de45d3..290a8d0d5eac 100644
>>> --- a/net/ipv6/ip6mr.c
>>> +++ b/net/ipv6/ip6mr.c
>>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
>>> - synchronize_rcu();
>>> + if (!err)
>>> + synchronize_rcu();
>> But... what is this synchronize_rcu() doing exactly ?
>> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c
>> ("ip6mr: Make mroute_sk rcu-based")
>> Typically on a delete, the synchronize_rcu() would be needed before
>> freeing the deleted object.
>> But nowadays we have better way : SOCK_RCU_FREE
> Hm. I'm agree with you. This is hot path, and mroute sockets created from
> will delay userspace tasks close() and exit(). Since there may be many such
> we may get a zombie task, which can't be reaped for ages. This slows down the
> Fix for pernet_operations works, but we need generic solution instead.
> The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says:
> ip6mr: Make mroute_sk rcu-based
> In ipmr the mr_table socket is handled under RCU. Introduce the same
> for ip6mr.
> There is no pointing to improvements it invents, or to the problem it solves.
> The description
> looks like a cleanup. It's too expensive cleanup, if it worsens the
> performance a hundred
> Can't we simply revert it?!
> Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested by
> We actually use rcu_dereference() in ip6mr_cache_report() only. The only user
> of dereference
> is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in
+ So this change should be safe.