On Wed, Dec 28, 2016 at 01:58:06PM +0800, Ding Tianhong wrote:
> Hi, Paul:
> 
> I try to debug this problem and found this solution could work well for both 
> problem scene.
> 
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 85c5a88..dbc14a7 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2172,7 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
>                         if (__rcu_reclaim(rdp->rsp->name, list))
>                                 cl++;
>                         c++;
> -                   local_bh_enable();
> +                 _local_bh_enable();
>                         cond_resched_rcu_qs();
>                         list = next;
>                 }
> 
> 
> The cond_resched_rcu_qs() would process the softirq if the softirq is 
> pending, so no need to use
> local_bh_enable() to process the softirq twice here, and it will avoid OOM 
> when huge packets arrives,
> what do you think about it? Please give me some suggestion.

>From what I can see, there is absolutely no guarantee that
cond_resched_rcu_qs() will do local_bh_enable(), and thus no guarantee
that it will process any pending softirqs -- and that is not part of
its job in any case.  So I cannot recommend the above patch.

On efficient handling of large invalid packets (that is still the issue,
right?), I must defer to Dave and Eric.

                                                                Thanx, Paul

> Thanks.
> Ding
> 
> On 2016/11/21 9:28, Ding Tianhong wrote:
> > 
> > 
> > On 2016/11/21 8:13, Paul E. McKenney wrote:
> >> On Sat, Nov 19, 2016 at 12:22:09AM -0800, Paul E. McKenney wrote:
> >>> On Sat, Nov 19, 2016 at 03:50:32PM +0800, Ding Tianhong wrote:
> >>>>
> >>>>
> >>>> On 2016/11/18 21:01, Paul E. McKenney wrote:
> >>>>> On Fri, Nov 18, 2016 at 08:40:09PM +0800, Ding Tianhong wrote:
> >>>>>> The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
> >>>>>> will introduce a new problem that when huge IP abnormal packet arrived,
> >>>>>> it may cause OOM and break the kernel, just like this:
> >>>>>>
> >>>>>> [   79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2
> >>>>>> [  100.067032] ksoftirqd/0: page allocation failure: order:0, 
> >>>>>> mode:0x120
> >>>>>> [  100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G           OE 
> >>>>>>  ----V-------   3.10.0-327.28.3.28.x86_64 #1
> >>>>>> [  100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> >>>>>> BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014
> >>>>>> [  100.067041]  0000000000000120 00000000b080d798 ffff8802afd5b968 
> >>>>>> ffffffff81638cb9
> >>>>>> [  100.067045]  ffff8802afd5b9f8 ffffffff81171380 0000000000000010 
> >>>>>> 0000000000000000
> >>>>>> [  100.067048]  ffff8802befd8000 00000000ffffffff 0000000000000001 
> >>>>>> 00000000b080d798
> >>>>>> [  100.067050] Call Trace:
> >>>>>> [  100.067057]  [<ffffffff81638cb9>] dump_stack+0x19/0x1b
> >>>>>> [  100.067062]  [<ffffffff81171380>] warn_alloc_failed+0x110/0x180
> >>>>>> [  100.067066]  [<ffffffff81175b16>] __alloc_pages_nodemask+0x9b6/0xba0
> >>>>>> [  100.067070]  [<ffffffff8151e400>] ? skb_add_rx_frag+0x90/0xb0
> >>>>>> [  100.067075]  [<ffffffff811b6fba>] alloc_pages_current+0xaa/0x170
> >>>>>> [  100.067080]  [<ffffffffa06b9be0>] 
> >>>>>> mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en]
> >>>>>> [  100.067083]  [<ffffffffa06b9dec>] mlx4_en_alloc_frags+0xdc/0x220 
> >>>>>> [mlx4_en]
> >>>>>> [  100.067086]  [<ffffffff8152eeb8>] ? __netif_receive_skb+0x18/0x60
> >>>>>> [  100.067088]  [<ffffffff8152ef40>] ? netif_receive_skb+0x40/0xc0
> >>>>>> [  100.067092]  [<ffffffffa06bb521>] mlx4_en_process_rx_cq+0x5f1/0xec0 
> >>>>>> [mlx4_en]
> >>>>>> [  100.067095]  [<ffffffff8131027d>] ? list_del+0xd/0x30
> >>>>>> [  100.067098]  [<ffffffff8152c90f>] ? __napi_complete+0x1f/0x30
> >>>>>> [  100.067101]  [<ffffffffa06bbeef>] mlx4_en_poll_rx_cq+0x9f/0x170 
> >>>>>> [mlx4_en]
> >>>>>> [  100.067103]  [<ffffffff8152f372>] net_rx_action+0x152/0x240
> >>>>>> [  100.067107]  [<ffffffff81084d1f>] __do_softirq+0xef/0x280
> >>>>>> [  100.067109]  [<ffffffff81084ee0>] run_ksoftirqd+0x30/0x50
> >>>>>> [  100.067114]  [<ffffffff810ae93f>] smpboot_thread_fn+0xff/0x1a0
> >>>>>> [  100.067117]  [<ffffffff8163e269>] ? schedule+0x29/0x70
> >>>>>> [  100.067120]  [<ffffffff810ae840>] ? lg_double_unlock+0x90/0x90
> >>>>>> [  100.067122]  [<ffffffff810a5d4f>] kthread+0xcf/0xe0
> >>>>>> [  100.067124]  [<ffffffff810a5c80>] ? 
> >>>>>> kthread_create_on_node+0x140/0x140
> >>>>>> [  100.067127]  [<ffffffff81649198>] ret_from_fork+0x58/0x90
> >>>>>> [  100.067129]  [<ffffffff810a5c80>] ? 
> >>>>>> kthread_create_on_node+0x140/0x140
> >>>>>>
> >>>>>> ================================cut 
> >>>>>> here=====================================
> >>>>>>
> >>>>>> The reason is that the huge abnormal IP packet will be received to net 
> >>>>>> stack
> >>>>>> and be dropped finally by dst_release, and the dst_release would use 
> >>>>>> the rcuos
> >>>>>> callback-offload kthread to free the packet, but the 
> >>>>>> cond_resched_rcu_qs() will
> >>>>>> calling do_softirq() to receive more and more IP abnormal packets 
> >>>>>> which will be
> >>>>>> throw into the RCU callbacks again later, the number of received 
> >>>>>> packet is much
> >>>>>> greater than the number of packets freed, it will exhaust the memory 
> >>>>>> and then OOM,
> >>>>>> so don't try to process any pending softirqs in the rcuos 
> >>>>>> callback-offload kthread
> >>>>>> is a more effective solution.
> >>>>>
> >>>>> OK, but we could still have softirqs processed by the grace-period 
> >>>>> kthread
> >>>>> as a result of any number of other events.  So this change might reduce
> >>>>> the probability of this problem, but it doesn't eliminate it.
> >>>>>
> >>>>> How huge are these huge IP packets?  Is the underlying problem that they
> >>>>> are too large to use the memory-allocator fastpaths?
> >>>>>
> >>>>>                                                         Thanx, Paul
> >>>>>
> >>>>
> >>>> I use the 40G mellanox NiC to receive packet, and the testgine could 
> >>>> send Mac abnormal packet and
> >>>> IP abnormal packet to full speed.
> >>>>
> >>>> The Mac abnormal packet would be dropped at low level and not be 
> >>>> received to net stack,
> >>>> but the IP abnormal packet will introduce this problem, every packet 
> >>>> will looks as new dst first and
> >>>> release later by dst_release because it is meaningless.
> >>>>
> >>>> dst_release->call_rcu(&dst->rcu_head, dst_destroy_rcu);
> >>>>
> >>>> so all packet will be freed until the rcuos callback-offload kthread 
> >>>> processing, it will be a infinite loop
> >>>> if huge packet is coming because the do_softirq will load more and more 
> >>>> packet to the rcuos processing kthread,
> >>>> so I still could not find a better way to fix this, btw, it is really 
> >>>> hard to say the driver use too large memory-allocater
> >>>> fastpaths, there is no memory leak and the Ixgbe may meet the same 
> >>>> problem too.
> >>
> >> And following up on my fastpath point -- from what I can see, one
> >> big effect of the large invalid packets is that they push processing
> >> off of a number of fastpaths.  If these packets could be rejected with
> >> less per-packet processing, I bet that things would work much better.
> >>
> >>                                            Thanx, Paul
> > 
> > Yes, and I found the WARN_ON_ONCE(!irqs_disabled()) will be triggered if 
> > use _local_bh_enable here,
> > so I think we could ask some help from Eric and David how to reject the 
> > huge number packets.
> > 
> > Thanks
> > Ding
> > 
> >>
> >>> The overall effect of these two patches is to move from enabling bh
> >>> (and processing recent softirqs) to enabling bh without processing
> >>> recent softirqs.  Is this really the correct way to solve this problem?
> >>> What about this solution is avoiding re-introducing the original
> >>> softlockups?  Have you talked to the networking guys about this issue?
> >>>
> >>>                                                   Thanx, Paul
> >>>
> >>>> Thanks.
> >>>> Ding
> >>>>
> >>>>
> >>>>>> Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
> >>>>>> Signed-off-by: Ding Tianhong <[email protected]>
> >>>>>>
> >>>>>> Signed-off-by: Ding Tianhong <[email protected]>
> >>>>>> ---
> >>>>>>  kernel/rcu/tree_plugin.h | 3 +--
> >>>>>>  1 file changed, 1 insertion(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >>>>>> index 85c5a88..760c3b5 100644
> >>>>>> --- a/kernel/rcu/tree_plugin.h
> >>>>>> +++ b/kernel/rcu/tree_plugin.h
> >>>>>> @@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
> >>>>>>                        if (__rcu_reclaim(rdp->rsp->name, list))
> >>>>>>                                cl++;
> >>>>>>                        c++;
> >>>>>> -                      local_bh_enable();
> >>>>>> -                      cond_resched_rcu_qs();
> >>>>>> +                      _local_bh_enable();
> >>>>>>                        list = next;
> >>>>>>                }
> >>>>>>                trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> >>>>>> -- 
> >>>>>> 1.9.0
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> .
> >>>>>
> >>>>
> >>
> >>
> >> .
> >>
> 

Reply via email to