Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-11-18 Thread Ding Tianhong


On 2016/11/18 20:56, Paul E. McKenney wrote:
> On Fri, Nov 18, 2016 at 08:37:28PM +0800, Ding Tianhong wrote:
>>
>>
>> On 2016/8/10 9:59, Paul E. McKenney wrote:
>>> On Wed, Aug 10, 2016 at 09:13:14AM +0800, Ding Tianhong wrote:
 On 2016/6/16 22:19, Paul E. McKenney wrote:
> On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
>> On 2016/6/15 23:49, Paul E. McKenney wrote:
>>> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
 I met this problem when using the Testgine to send package to ixgbevf 
 nic
 by this steps:
 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
 2. Then use ifconfig to down the nic and up again, loop for several 
 times.
 3. The system panic by soft lockup.
>>>
>>> Good catch, queued for review and testing.  But what .config was your
>>> kernel built with?
>>>
>>
>> I use the redhat7.1 defconfig to build my kernel, and the RCU config is 
>> this:
>>  120 #
>>  121 # RCU Subsystem
>>  122 #
>>  123 CONFIG_TREE_RCU=y
>>  124 # CONFIG_PREEMPT_RCU is not set
>>  125 CONFIG_RCU_STALL_COMMON=y
>>  126 CONFIG_CONTEXT_TRACKING=y
>>  127 CONFIG_RCU_USER_QS=y
>>  128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
>>  129 CONFIG_RCU_FANOUT=64
>>  130 CONFIG_RCU_FANOUT_LEAF=16
>>  131 # CONFIG_RCU_FANOUT_EXACT is not set
>>  132 # CONFIG_RCU_FAST_NO_HZ is not set
>>  133 # CONFIG_TREE_RCU_TRACE is not set
>>  134 CONFIG_RCU_NOCB_CPU=y
>>  135 CONFIG_RCU_NOCB_CPU_ALL=y
>>  136 CONFIG_BUILD_BIN2C=y
>
> Thank you!  You were running with preemption disabled, so your system
> would indeed be very susceptible to this problem.
>
>>> Also, I did tweak both the commit log and the patch.  Your 
>>> cond_resched()
>>> would prevent soft lockups, but not RCU stalls, so I substituted
>>> cond_resched_rcu_qs().  Please let me know if either of those changes
>>> causes problems at your end.
>>
>> Looks fine to me, I will apply this to my branch and test it, thanks.
>
> Please let me know how it goes!
>
>   Thanx, Paul
>

 Hi Paul:

 It has been a long time after applying this patch, and didn't found any 
 problem, I believe this patch is fine, thanks.
>>>
>>> Very good!  I will push this one upstream during the next merge window.
>>>
>>> Thanx, Paul
>>>
>>
>> Hi Paul:
>>
>> Sorry to say that I have found this patch will introduce an OOM problem, it 
>> will be triggered by huge IP abnormal packet
>> arrived, it looks that avoid process any pending softirqs in the rcuos 
>> kthread is the best way to fix this problem, I will
>> send a new patch to revert this and fix the problem.
> 
> Interesting...
> 
> Could you please let me know exactly how the added cond_resched_rcu_qs()
> leads to an OOM?  Is it that the softirqs prevent the grace-period kthread
> from making progress?
> 

Ok, reply and discuss on other patch, thanks.

Ding

>   Thanx, Paul
> 
> 
> .
> 



Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-11-18 Thread Ding Tianhong


On 2016/11/18 20:56, Paul E. McKenney wrote:
> On Fri, Nov 18, 2016 at 08:37:28PM +0800, Ding Tianhong wrote:
>>
>>
>> On 2016/8/10 9:59, Paul E. McKenney wrote:
>>> On Wed, Aug 10, 2016 at 09:13:14AM +0800, Ding Tianhong wrote:
 On 2016/6/16 22:19, Paul E. McKenney wrote:
> On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
>> On 2016/6/15 23:49, Paul E. McKenney wrote:
>>> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
 I met this problem when using the Testgine to send package to ixgbevf 
 nic
 by this steps:
 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
 2. Then use ifconfig to down the nic and up again, loop for several 
 times.
 3. The system panic by soft lockup.
>>>
>>> Good catch, queued for review and testing.  But what .config was your
>>> kernel built with?
>>>
>>
>> I use the redhat7.1 defconfig to build my kernel, and the RCU config is 
>> this:
>>  120 #
>>  121 # RCU Subsystem
>>  122 #
>>  123 CONFIG_TREE_RCU=y
>>  124 # CONFIG_PREEMPT_RCU is not set
>>  125 CONFIG_RCU_STALL_COMMON=y
>>  126 CONFIG_CONTEXT_TRACKING=y
>>  127 CONFIG_RCU_USER_QS=y
>>  128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
>>  129 CONFIG_RCU_FANOUT=64
>>  130 CONFIG_RCU_FANOUT_LEAF=16
>>  131 # CONFIG_RCU_FANOUT_EXACT is not set
>>  132 # CONFIG_RCU_FAST_NO_HZ is not set
>>  133 # CONFIG_TREE_RCU_TRACE is not set
>>  134 CONFIG_RCU_NOCB_CPU=y
>>  135 CONFIG_RCU_NOCB_CPU_ALL=y
>>  136 CONFIG_BUILD_BIN2C=y
>
> Thank you!  You were running with preemption disabled, so your system
> would indeed be very susceptible to this problem.
>
>>> Also, I did tweak both the commit log and the patch.  Your 
>>> cond_resched()
>>> would prevent soft lockups, but not RCU stalls, so I substituted
>>> cond_resched_rcu_qs().  Please let me know if either of those changes
>>> causes problems at your end.
>>
>> Looks fine to me, I will apply this to my branch and test it, thanks.
>
> Please let me know how it goes!
>
>   Thanx, Paul
>

 Hi Paul:

 It has been a long time after applying this patch, and didn't found any 
 problem, I believe this patch is fine, thanks.
>>>
>>> Very good!  I will push this one upstream during the next merge window.
>>>
>>> Thanx, Paul
>>>
>>
>> Hi Paul:
>>
>> Sorry to say that I have found this patch will introduce an OOM problem, it 
>> will be triggered by huge IP abnormal packet
>> arrived, it looks that avoid process any pending softirqs in the rcuos 
>> kthread is the best way to fix this problem, I will
>> send a new patch to revert this and fix the problem.
> 
> Interesting...
> 
> Could you please let me know exactly how the added cond_resched_rcu_qs()
> leads to an OOM?  Is it that the softirqs prevent the grace-period kthread
> from making progress?
> 

Ok, reply and discuss on other patch, thanks.

Ding

>   Thanx, Paul
> 
> 
> .
> 



Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-11-18 Thread Paul E. McKenney
On Fri, Nov 18, 2016 at 08:37:28PM +0800, Ding Tianhong wrote:
> 
> 
> On 2016/8/10 9:59, Paul E. McKenney wrote:
> > On Wed, Aug 10, 2016 at 09:13:14AM +0800, Ding Tianhong wrote:
> >> On 2016/6/16 22:19, Paul E. McKenney wrote:
> >>> On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
>  On 2016/6/15 23:49, Paul E. McKenney wrote:
> > On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
> >> I met this problem when using the Testgine to send package to ixgbevf 
> >> nic
> >> by this steps:
> >> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
> >> 2. Then use ifconfig to down the nic and up again, loop for several 
> >> times.
> >> 3. The system panic by soft lockup.
> >
> > Good catch, queued for review and testing.  But what .config was your
> > kernel built with?
> >
> 
>  I use the redhat7.1 defconfig to build my kernel, and the RCU config is 
>  this:
>   120 #
>   121 # RCU Subsystem
>   122 #
>   123 CONFIG_TREE_RCU=y
>   124 # CONFIG_PREEMPT_RCU is not set
>   125 CONFIG_RCU_STALL_COMMON=y
>   126 CONFIG_CONTEXT_TRACKING=y
>   127 CONFIG_RCU_USER_QS=y
>   128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
>   129 CONFIG_RCU_FANOUT=64
>   130 CONFIG_RCU_FANOUT_LEAF=16
>   131 # CONFIG_RCU_FANOUT_EXACT is not set
>   132 # CONFIG_RCU_FAST_NO_HZ is not set
>   133 # CONFIG_TREE_RCU_TRACE is not set
>   134 CONFIG_RCU_NOCB_CPU=y
>   135 CONFIG_RCU_NOCB_CPU_ALL=y
>   136 CONFIG_BUILD_BIN2C=y
> >>>
> >>> Thank you!  You were running with preemption disabled, so your system
> >>> would indeed be very susceptible to this problem.
> >>>
> > Also, I did tweak both the commit log and the patch.  Your 
> > cond_resched()
> > would prevent soft lockups, but not RCU stalls, so I substituted
> > cond_resched_rcu_qs().  Please let me know if either of those changes
> > causes problems at your end.
> 
>  Looks fine to me, I will apply this to my branch and test it, thanks.
> >>>
> >>> Please let me know how it goes!
> >>>
> >>>   Thanx, Paul
> >>>
> >>
> >> Hi Paul:
> >>
> >> It has been a long time after applying this patch, and didn't found any 
> >> problem, I believe this patch is fine, thanks.
> > 
> > Very good!  I will push this one upstream during the next merge window.
> > 
> > Thanx, Paul
> > 
> 
> Hi Paul:
> 
> Sorry to say that I have found this patch will introduce an OOM problem, it 
> will be triggered by huge IP abnormal packet
> arrived, it looks that avoid process any pending softirqs in the rcuos 
> kthread is the best way to fix this problem, I will
> send a new patch to revert this and fix the problem.

Interesting...

Could you please let me know exactly how the added cond_resched_rcu_qs()
leads to an OOM?  Is it that the softirqs prevent the grace-period kthread
from making progress?

Thanx, Paul



Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-11-18 Thread Paul E. McKenney
On Fri, Nov 18, 2016 at 08:37:28PM +0800, Ding Tianhong wrote:
> 
> 
> On 2016/8/10 9:59, Paul E. McKenney wrote:
> > On Wed, Aug 10, 2016 at 09:13:14AM +0800, Ding Tianhong wrote:
> >> On 2016/6/16 22:19, Paul E. McKenney wrote:
> >>> On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
>  On 2016/6/15 23:49, Paul E. McKenney wrote:
> > On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
> >> I met this problem when using the Testgine to send package to ixgbevf 
> >> nic
> >> by this steps:
> >> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
> >> 2. Then use ifconfig to down the nic and up again, loop for several 
> >> times.
> >> 3. The system panic by soft lockup.
> >
> > Good catch, queued for review and testing.  But what .config was your
> > kernel built with?
> >
> 
>  I use the redhat7.1 defconfig to build my kernel, and the RCU config is 
>  this:
>   120 #
>   121 # RCU Subsystem
>   122 #
>   123 CONFIG_TREE_RCU=y
>   124 # CONFIG_PREEMPT_RCU is not set
>   125 CONFIG_RCU_STALL_COMMON=y
>   126 CONFIG_CONTEXT_TRACKING=y
>   127 CONFIG_RCU_USER_QS=y
>   128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
>   129 CONFIG_RCU_FANOUT=64
>   130 CONFIG_RCU_FANOUT_LEAF=16
>   131 # CONFIG_RCU_FANOUT_EXACT is not set
>   132 # CONFIG_RCU_FAST_NO_HZ is not set
>   133 # CONFIG_TREE_RCU_TRACE is not set
>   134 CONFIG_RCU_NOCB_CPU=y
>   135 CONFIG_RCU_NOCB_CPU_ALL=y
>   136 CONFIG_BUILD_BIN2C=y
> >>>
> >>> Thank you!  You were running with preemption disabled, so your system
> >>> would indeed be very susceptible to this problem.
> >>>
> > Also, I did tweak both the commit log and the patch.  Your 
> > cond_resched()
> > would prevent soft lockups, but not RCU stalls, so I substituted
> > cond_resched_rcu_qs().  Please let me know if either of those changes
> > causes problems at your end.
> 
>  Looks fine to me, I will apply this to my branch and test it, thanks.
> >>>
> >>> Please let me know how it goes!
> >>>
> >>>   Thanx, Paul
> >>>
> >>
> >> Hi Paul:
> >>
> >> It has been a long time after applying this patch, and didn't found any 
> >> problem, I believe this patch is fine, thanks.
> > 
> > Very good!  I will push this one upstream during the next merge window.
> > 
> > Thanx, Paul
> > 
> 
> Hi Paul:
> 
> Sorry to say that I have found this patch will introduce an OOM problem, it 
> will be triggered by huge IP abnormal packet
> arrived, it looks that avoid process any pending softirqs in the rcuos 
> kthread is the best way to fix this problem, I will
> send a new patch to revert this and fix the problem.

Interesting...

Could you please let me know exactly how the added cond_resched_rcu_qs()
leads to an OOM?  Is it that the softirqs prevent the grace-period kthread
from making progress?

Thanx, Paul



Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-11-18 Thread Ding Tianhong


On 2016/8/10 9:59, Paul E. McKenney wrote:
> On Wed, Aug 10, 2016 at 09:13:14AM +0800, Ding Tianhong wrote:
>> On 2016/6/16 22:19, Paul E. McKenney wrote:
>>> On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
 On 2016/6/15 23:49, Paul E. McKenney wrote:
> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
>> I met this problem when using the Testgine to send package to ixgbevf nic
>> by this steps:
>> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
>> 2. Then use ifconfig to down the nic and up again, loop for several 
>> times.
>> 3. The system panic by soft lockup.
>
> Good catch, queued for review and testing.  But what .config was your
> kernel built with?
>

 I use the redhat7.1 defconfig to build my kernel, and the RCU config is 
 this:
  120 #
  121 # RCU Subsystem
  122 #
  123 CONFIG_TREE_RCU=y
  124 # CONFIG_PREEMPT_RCU is not set
  125 CONFIG_RCU_STALL_COMMON=y
  126 CONFIG_CONTEXT_TRACKING=y
  127 CONFIG_RCU_USER_QS=y
  128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
  129 CONFIG_RCU_FANOUT=64
  130 CONFIG_RCU_FANOUT_LEAF=16
  131 # CONFIG_RCU_FANOUT_EXACT is not set
  132 # CONFIG_RCU_FAST_NO_HZ is not set
  133 # CONFIG_TREE_RCU_TRACE is not set
  134 CONFIG_RCU_NOCB_CPU=y
  135 CONFIG_RCU_NOCB_CPU_ALL=y
  136 CONFIG_BUILD_BIN2C=y
>>>
>>> Thank you!  You were running with preemption disabled, so your system
>>> would indeed be very susceptible to this problem.
>>>
> Also, I did tweak both the commit log and the patch.  Your cond_resched()
> would prevent soft lockups, but not RCU stalls, so I substituted
> cond_resched_rcu_qs().  Please let me know if either of those changes
> causes problems at your end.

 Looks fine to me, I will apply this to my branch and test it, thanks.
>>>
>>> Please let me know how it goes!
>>>
>>> Thanx, Paul
>>>
>>
>> Hi Paul:
>>
>> It has been a long time after applying this patch, and didn't found any 
>> problem, I believe this patch is fine, thanks.
> 
> Very good!  I will push this one upstream during the next merge window.
> 
>   Thanx, Paul
> 

Hi Paul:

Sorry to say that I have found this patch will introduce an OOM problem, it 
will be triggered by huge IP abnormal packet
arrived, it looks that avoid process any pending softirqs in the rcuos kthread 
is the best way to fix this problem, I will
send a new patch to revert this and fix the problem.

Thanks.
Ding


>> Ding
>>
 Ding

>
>   Thanx, Paul
>
> 
>
> commit c317cf19b34c0d2787b787c38bd2c8fe433215da
> Author: Ding Tianhong 
> Date:   Wed Jun 15 15:27:36 2016 +0800
>
> rcu: Fix soft lockup for rcu_nocb_kthread
> 
> Carrying out the following steps results in a softlockup in the
> RCU callback-offload (rcuo) kthreads:
> 
> 1. Connect to ixgbevf, and set the speed to 10Gb/s.
> 2. Use ifconfig to bring the nic up and down repeatedly.
> 
> [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
> [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
> 88057dd9c000
> [  368.106005] RIP: 0010:[]  [] 
> fib_table_lookup+0x14/0x390
> [  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
> [  368.106005] RAX: 0001 RBX: 020155c0 RCX: 
> 0001
> [  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 
> 880036d11a00
> [  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
> 
> [  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 
> 88061fc83c58
> [  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 
> 020155c0
> [  368.106005] FS:  () GS:88061fc8() 
> knlGS:
> [  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
> [  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 
> 000407e0
> [  368.106005] DR0:  DR1:  DR2: 
> 
> [  368.106005] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  368.106005] Stack:
> [  368.106005]  01c0 88057b766000 8802e380b000 
> 88057af03e00
> [  368.106005]  88061fc83dc0 815349a6 

Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-11-18 Thread Ding Tianhong


On 2016/8/10 9:59, Paul E. McKenney wrote:
> On Wed, Aug 10, 2016 at 09:13:14AM +0800, Ding Tianhong wrote:
>> On 2016/6/16 22:19, Paul E. McKenney wrote:
>>> On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
 On 2016/6/15 23:49, Paul E. McKenney wrote:
> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
>> I met this problem when using the Testgine to send package to ixgbevf nic
>> by this steps:
>> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
>> 2. Then use ifconfig to down the nic and up again, loop for several 
>> times.
>> 3. The system panic by soft lockup.
>
> Good catch, queued for review and testing.  But what .config was your
> kernel built with?
>

 I use the redhat7.1 defconfig to build my kernel, and the RCU config is 
 this:
  120 #
  121 # RCU Subsystem
  122 #
  123 CONFIG_TREE_RCU=y
  124 # CONFIG_PREEMPT_RCU is not set
  125 CONFIG_RCU_STALL_COMMON=y
  126 CONFIG_CONTEXT_TRACKING=y
  127 CONFIG_RCU_USER_QS=y
  128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
  129 CONFIG_RCU_FANOUT=64
  130 CONFIG_RCU_FANOUT_LEAF=16
  131 # CONFIG_RCU_FANOUT_EXACT is not set
  132 # CONFIG_RCU_FAST_NO_HZ is not set
  133 # CONFIG_TREE_RCU_TRACE is not set
  134 CONFIG_RCU_NOCB_CPU=y
  135 CONFIG_RCU_NOCB_CPU_ALL=y
  136 CONFIG_BUILD_BIN2C=y
>>>
>>> Thank you!  You were running with preemption disabled, so your system
>>> would indeed be very susceptible to this problem.
>>>
> Also, I did tweak both the commit log and the patch.  Your cond_resched()
> would prevent soft lockups, but not RCU stalls, so I substituted
> cond_resched_rcu_qs().  Please let me know if either of those changes
> causes problems at your end.

 Looks fine to me, I will apply this to my branch and test it, thanks.
>>>
>>> Please let me know how it goes!
>>>
>>> Thanx, Paul
>>>
>>
>> Hi Paul:
>>
>> It has been a long time after applying this patch, and didn't found any 
>> problem, I believe this patch is fine, thanks.
> 
> Very good!  I will push this one upstream during the next merge window.
> 
>   Thanx, Paul
> 

Hi Paul:

Sorry to say that I have found this patch will introduce an OOM problem, it 
will be triggered by huge IP abnormal packet
arrived, it looks that avoid process any pending softirqs in the rcuos kthread 
is the best way to fix this problem, I will
send a new patch to revert this and fix the problem.

Thanks.
Ding


>> Ding
>>
 Ding

>
>   Thanx, Paul
>
> 
>
> commit c317cf19b34c0d2787b787c38bd2c8fe433215da
> Author: Ding Tianhong 
> Date:   Wed Jun 15 15:27:36 2016 +0800
>
> rcu: Fix soft lockup for rcu_nocb_kthread
> 
> Carrying out the following steps results in a softlockup in the
> RCU callback-offload (rcuo) kthreads:
> 
> 1. Connect to ixgbevf, and set the speed to 10Gb/s.
> 2. Use ifconfig to bring the nic up and down repeatedly.
> 
> [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
> [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
> 88057dd9c000
> [  368.106005] RIP: 0010:[]  [] 
> fib_table_lookup+0x14/0x390
> [  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
> [  368.106005] RAX: 0001 RBX: 020155c0 RCX: 
> 0001
> [  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 
> 880036d11a00
> [  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
> 
> [  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 
> 88061fc83c58
> [  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 
> 020155c0
> [  368.106005] FS:  () GS:88061fc8() 
> knlGS:
> [  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
> [  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 
> 000407e0
> [  368.106005] DR0:  DR1:  DR2: 
> 
> [  368.106005] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  368.106005] Stack:
> [  368.106005]  01c0 88057b766000 8802e380b000 
> 88057af03e00
> [  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
> 

Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-08-09 Thread Paul E. McKenney
On Wed, Aug 10, 2016 at 09:13:14AM +0800, Ding Tianhong wrote:
> On 2016/6/16 22:19, Paul E. McKenney wrote:
> > On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
> >> On 2016/6/15 23:49, Paul E. McKenney wrote:
> >>> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
>  I met this problem when using the Testgine to send package to ixgbevf nic
>  by this steps:
>  1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
>  2. Then use ifconfig to down the nic and up again, loop for several 
>  times.
>  3. The system panic by soft lockup.
> >>>
> >>> Good catch, queued for review and testing.  But what .config was your
> >>> kernel built with?
> >>>
> >>
> >> I use the redhat7.1 defconfig to build my kernel, and the RCU config is 
> >> this:
> >>  120 #
> >>  121 # RCU Subsystem
> >>  122 #
> >>  123 CONFIG_TREE_RCU=y
> >>  124 # CONFIG_PREEMPT_RCU is not set
> >>  125 CONFIG_RCU_STALL_COMMON=y
> >>  126 CONFIG_CONTEXT_TRACKING=y
> >>  127 CONFIG_RCU_USER_QS=y
> >>  128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
> >>  129 CONFIG_RCU_FANOUT=64
> >>  130 CONFIG_RCU_FANOUT_LEAF=16
> >>  131 # CONFIG_RCU_FANOUT_EXACT is not set
> >>  132 # CONFIG_RCU_FAST_NO_HZ is not set
> >>  133 # CONFIG_TREE_RCU_TRACE is not set
> >>  134 CONFIG_RCU_NOCB_CPU=y
> >>  135 CONFIG_RCU_NOCB_CPU_ALL=y
> >>  136 CONFIG_BUILD_BIN2C=y
> > 
> > Thank you!  You were running with preemption disabled, so your system
> > would indeed be very susceptible to this problem.
> > 
> >>> Also, I did tweak both the commit log and the patch.  Your cond_resched()
> >>> would prevent soft lockups, but not RCU stalls, so I substituted
> >>> cond_resched_rcu_qs().  Please let me know if either of those changes
> >>> causes problems at your end.
> >>
> >> Looks fine to me, I will apply this to my branch and test it, thanks.
> > 
> > Please let me know how it goes!
> > 
> > Thanx, Paul
> > 
> 
> Hi Paul:
> 
> It has been a long time after applying this patch, and didn't found any 
> problem, I believe this patch is fine, thanks.

Very good!  I will push this one upstream during the next merge window.

Thanx, Paul

> Ding
> 
> >> Ding
> >>
> >>>
> >>>   Thanx, Paul
> >>>
> >>> 
> >>>
> >>> commit c317cf19b34c0d2787b787c38bd2c8fe433215da
> >>> Author: Ding Tianhong 
> >>> Date:   Wed Jun 15 15:27:36 2016 +0800
> >>>
> >>> rcu: Fix soft lockup for rcu_nocb_kthread
> >>> 
> >>> Carrying out the following steps results in a softlockup in the
> >>> RCU callback-offload (rcuo) kthreads:
> >>> 
> >>> 1. Connect to ixgbevf, and set the speed to 10Gb/s.
> >>> 2. Use ifconfig to bring the nic up and down repeatedly.
> >>> 
> >>> [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> >>> [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
> >>> [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> >>> [  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
> >>> 88057dd9c000
> >>> [  368.106005] RIP: 0010:[]  [] 
> >>> fib_table_lookup+0x14/0x390
> >>> [  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
> >>> [  368.106005] RAX: 0001 RBX: 020155c0 RCX: 
> >>> 0001
> >>> [  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 
> >>> 880036d11a00
> >>> [  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
> >>> 
> >>> [  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 
> >>> 88061fc83c58
> >>> [  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 
> >>> 020155c0
> >>> [  368.106005] FS:  () GS:88061fc8() 
> >>> knlGS:
> >>> [  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
> >>> [  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 
> >>> 000407e0
> >>> [  368.106005] DR0:  DR1:  DR2: 
> >>> 
> >>> [  368.106005] DR3:  DR6: 0ff0 DR7: 
> >>> 0400
> >>> [  368.106005] Stack:
> >>> [  368.106005]  01c0 88057b766000 8802e380b000 
> >>> 88057af03e00
> >>> [  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
> >>> 814ee146
> >>> [  368.106005]  8802e380af00 e380af00 819e0900 
> >>> 020155c001c0
> >>> [  368.106005] Call Trace:
> >>> [  368.106005]  
> >>> [  368.106005]
> >>> [  368.106005]  [] ip_route_input_noref+0x516/0xbd0
> >>> [  368.106005]  [] ? skb_release_data+0xd6/0x110
> >>> [  368.106005]  [] ? 

Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-08-09 Thread Paul E. McKenney
On Wed, Aug 10, 2016 at 09:13:14AM +0800, Ding Tianhong wrote:
> On 2016/6/16 22:19, Paul E. McKenney wrote:
> > On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
> >> On 2016/6/15 23:49, Paul E. McKenney wrote:
> >>> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
>  I met this problem when using the Testgine to send package to ixgbevf nic
>  by this steps:
>  1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
>  2. Then use ifconfig to down the nic and up again, loop for several 
>  times.
>  3. The system panic by soft lockup.
> >>>
> >>> Good catch, queued for review and testing.  But what .config was your
> >>> kernel built with?
> >>>
> >>
> >> I use the redhat7.1 defconfig to build my kernel, and the RCU config is 
> >> this:
> >>  120 #
> >>  121 # RCU Subsystem
> >>  122 #
> >>  123 CONFIG_TREE_RCU=y
> >>  124 # CONFIG_PREEMPT_RCU is not set
> >>  125 CONFIG_RCU_STALL_COMMON=y
> >>  126 CONFIG_CONTEXT_TRACKING=y
> >>  127 CONFIG_RCU_USER_QS=y
> >>  128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
> >>  129 CONFIG_RCU_FANOUT=64
> >>  130 CONFIG_RCU_FANOUT_LEAF=16
> >>  131 # CONFIG_RCU_FANOUT_EXACT is not set
> >>  132 # CONFIG_RCU_FAST_NO_HZ is not set
> >>  133 # CONFIG_TREE_RCU_TRACE is not set
> >>  134 CONFIG_RCU_NOCB_CPU=y
> >>  135 CONFIG_RCU_NOCB_CPU_ALL=y
> >>  136 CONFIG_BUILD_BIN2C=y
> > 
> > Thank you!  You were running with preemption disabled, so your system
> > would indeed be very susceptible to this problem.
> > 
> >>> Also, I did tweak both the commit log and the patch.  Your cond_resched()
> >>> would prevent soft lockups, but not RCU stalls, so I substituted
> >>> cond_resched_rcu_qs().  Please let me know if either of those changes
> >>> causes problems at your end.
> >>
> >> Looks fine to me, I will apply this to my branch and test it, thanks.
> > 
> > Please let me know how it goes!
> > 
> > Thanx, Paul
> > 
> 
> Hi Paul:
> 
> It has been a long time after applying this patch, and didn't found any 
> problem, I believe this patch is fine, thanks.

Very good!  I will push this one upstream during the next merge window.

Thanx, Paul

> Ding
> 
> >> Ding
> >>
> >>>
> >>>   Thanx, Paul
> >>>
> >>> 
> >>>
> >>> commit c317cf19b34c0d2787b787c38bd2c8fe433215da
> >>> Author: Ding Tianhong 
> >>> Date:   Wed Jun 15 15:27:36 2016 +0800
> >>>
> >>> rcu: Fix soft lockup for rcu_nocb_kthread
> >>> 
> >>> Carrying out the following steps results in a softlockup in the
> >>> RCU callback-offload (rcuo) kthreads:
> >>> 
> >>> 1. Connect to ixgbevf, and set the speed to 10Gb/s.
> >>> 2. Use ifconfig to bring the nic up and down repeatedly.
> >>> 
> >>> [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> >>> [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
> >>> [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> >>> [  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
> >>> 88057dd9c000
> >>> [  368.106005] RIP: 0010:[]  [] 
> >>> fib_table_lookup+0x14/0x390
> >>> [  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
> >>> [  368.106005] RAX: 0001 RBX: 020155c0 RCX: 
> >>> 0001
> >>> [  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 
> >>> 880036d11a00
> >>> [  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
> >>> 
> >>> [  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 
> >>> 88061fc83c58
> >>> [  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 
> >>> 020155c0
> >>> [  368.106005] FS:  () GS:88061fc8() 
> >>> knlGS:
> >>> [  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
> >>> [  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 
> >>> 000407e0
> >>> [  368.106005] DR0:  DR1:  DR2: 
> >>> 
> >>> [  368.106005] DR3:  DR6: 0ff0 DR7: 
> >>> 0400
> >>> [  368.106005] Stack:
> >>> [  368.106005]  01c0 88057b766000 8802e380b000 
> >>> 88057af03e00
> >>> [  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
> >>> 814ee146
> >>> [  368.106005]  8802e380af00 e380af00 819e0900 
> >>> 020155c001c0
> >>> [  368.106005] Call Trace:
> >>> [  368.106005]  
> >>> [  368.106005]
> >>> [  368.106005]  [] ip_route_input_noref+0x516/0xbd0
> >>> [  368.106005]  [] ? skb_release_data+0xd6/0x110
> >>> [  368.106005]  [] ? kfree_skb+0x3a/0xa0
> >>> [  

Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-08-09 Thread Ding Tianhong
On 2016/6/16 22:19, Paul E. McKenney wrote:
> On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
>> On 2016/6/15 23:49, Paul E. McKenney wrote:
>>> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
 I met this problem when using the Testgine to send package to ixgbevf nic
 by this steps:
 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
 2. Then use ifconfig to down the nic and up again, loop for several times.
 3. The system panic by soft lockup.
>>>
>>> Good catch, queued for review and testing.  But what .config was your
>>> kernel built with?
>>>
>>
>> I use the redhat7.1 defconfig to build my kernel, and the RCU config is this:
>>  120 #
>>  121 # RCU Subsystem
>>  122 #
>>  123 CONFIG_TREE_RCU=y
>>  124 # CONFIG_PREEMPT_RCU is not set
>>  125 CONFIG_RCU_STALL_COMMON=y
>>  126 CONFIG_CONTEXT_TRACKING=y
>>  127 CONFIG_RCU_USER_QS=y
>>  128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
>>  129 CONFIG_RCU_FANOUT=64
>>  130 CONFIG_RCU_FANOUT_LEAF=16
>>  131 # CONFIG_RCU_FANOUT_EXACT is not set
>>  132 # CONFIG_RCU_FAST_NO_HZ is not set
>>  133 # CONFIG_TREE_RCU_TRACE is not set
>>  134 CONFIG_RCU_NOCB_CPU=y
>>  135 CONFIG_RCU_NOCB_CPU_ALL=y
>>  136 CONFIG_BUILD_BIN2C=y
> 
> Thank you!  You were running with preemption disabled, so your system
> would indeed be very susceptible to this problem.
> 
>>> Also, I did tweak both the commit log and the patch.  Your cond_resched()
>>> would prevent soft lockups, but not RCU stalls, so I substituted
>>> cond_resched_rcu_qs().  Please let me know if either of those changes
>>> causes problems at your end.
>>
>> Looks fine to me, I will apply this to my branch and test it, thanks.
> 
> Please let me know how it goes!
> 
>   Thanx, Paul
> 

Hi Paul:

It has been a long time after applying this patch, and didn't found any 
problem, I believe this patch is fine, thanks.

Ding

>> Ding
>>
>>>
>>> Thanx, Paul
>>>
>>> 
>>>
>>> commit c317cf19b34c0d2787b787c38bd2c8fe433215da
>>> Author: Ding Tianhong 
>>> Date:   Wed Jun 15 15:27:36 2016 +0800
>>>
>>> rcu: Fix soft lockup for rcu_nocb_kthread
>>> 
>>> Carrying out the following steps results in a softlockup in the
>>> RCU callback-offload (rcuo) kthreads:
>>> 
>>> 1. Connect to ixgbevf, and set the speed to 10Gb/s.
>>> 2. Use ifconfig to bring the nic up and down repeatedly.
>>> 
>>> [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
>>> [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
>>> [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>> [  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
>>> 88057dd9c000
>>> [  368.106005] RIP: 0010:[]  [] 
>>> fib_table_lookup+0x14/0x390
>>> [  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
>>> [  368.106005] RAX: 0001 RBX: 020155c0 RCX: 
>>> 0001
>>> [  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 
>>> 880036d11a00
>>> [  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
>>> 
>>> [  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 
>>> 88061fc83c58
>>> [  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 
>>> 020155c0
>>> [  368.106005] FS:  () GS:88061fc8() 
>>> knlGS:
>>> [  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
>>> [  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 
>>> 000407e0
>>> [  368.106005] DR0:  DR1:  DR2: 
>>> 
>>> [  368.106005] DR3:  DR6: 0ff0 DR7: 
>>> 0400
>>> [  368.106005] Stack:
>>> [  368.106005]  01c0 88057b766000 8802e380b000 
>>> 88057af03e00
>>> [  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
>>> 814ee146
>>> [  368.106005]  8802e380af00 e380af00 819e0900 
>>> 020155c001c0
>>> [  368.106005] Call Trace:
>>> [  368.106005]  
>>> [  368.106005]
>>> [  368.106005]  [] ip_route_input_noref+0x516/0xbd0
>>> [  368.106005]  [] ? skb_release_data+0xd6/0x110
>>> [  368.106005]  [] ? kfree_skb+0x3a/0xa0
>>> [  368.106005]  [] ip_rcv_finish+0x29f/0x350
>>> [  368.106005]  [] ip_rcv+0x234/0x380
>>> [  368.106005]  [] 
>>> __netif_receive_skb_core+0x676/0x870
>>> [  368.106005]  [] __netif_receive_skb+0x18/0x60
>>> [  368.106005]  [] process_backlog+0xae/0x180
>>> [  368.106005]  [] net_rx_action+0x152/0x240
>>> [  368.106005]  [] __do_softirq+0xef/0x280
>>> [  368.106005]  [] 

Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-08-09 Thread Ding Tianhong
On 2016/6/16 22:19, Paul E. McKenney wrote:
> On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
>> On 2016/6/15 23:49, Paul E. McKenney wrote:
>>> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
 I met this problem when using the Testgine to send package to ixgbevf nic
 by this steps:
 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
 2. Then use ifconfig to down the nic and up again, loop for several times.
 3. The system panic by soft lockup.
>>>
>>> Good catch, queued for review and testing.  But what .config was your
>>> kernel built with?
>>>
>>
>> I use the redhat7.1 defconfig to build my kernel, and the RCU config is this:
>>  120 #
>>  121 # RCU Subsystem
>>  122 #
>>  123 CONFIG_TREE_RCU=y
>>  124 # CONFIG_PREEMPT_RCU is not set
>>  125 CONFIG_RCU_STALL_COMMON=y
>>  126 CONFIG_CONTEXT_TRACKING=y
>>  127 CONFIG_RCU_USER_QS=y
>>  128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
>>  129 CONFIG_RCU_FANOUT=64
>>  130 CONFIG_RCU_FANOUT_LEAF=16
>>  131 # CONFIG_RCU_FANOUT_EXACT is not set
>>  132 # CONFIG_RCU_FAST_NO_HZ is not set
>>  133 # CONFIG_TREE_RCU_TRACE is not set
>>  134 CONFIG_RCU_NOCB_CPU=y
>>  135 CONFIG_RCU_NOCB_CPU_ALL=y
>>  136 CONFIG_BUILD_BIN2C=y
> 
> Thank you!  You were running with preemption disabled, so your system
> would indeed be very susceptible to this problem.
> 
>>> Also, I did tweak both the commit log and the patch.  Your cond_resched()
>>> would prevent soft lockups, but not RCU stalls, so I substituted
>>> cond_resched_rcu_qs().  Please let me know if either of those changes
>>> causes problems at your end.
>>
>> Looks fine to me, I will apply this to my branch and test it, thanks.
> 
> Please let me know how it goes!
> 
>   Thanx, Paul
> 

Hi Paul:

It has been a long time after applying this patch, and didn't found any 
problem, I believe this patch is fine, thanks.

Ding

>> Ding
>>
>>>
>>> Thanx, Paul
>>>
>>> 
>>>
>>> commit c317cf19b34c0d2787b787c38bd2c8fe433215da
>>> Author: Ding Tianhong 
>>> Date:   Wed Jun 15 15:27:36 2016 +0800
>>>
>>> rcu: Fix soft lockup for rcu_nocb_kthread
>>> 
>>> Carrying out the following steps results in a softlockup in the
>>> RCU callback-offload (rcuo) kthreads:
>>> 
>>> 1. Connect to ixgbevf, and set the speed to 10Gb/s.
>>> 2. Use ifconfig to bring the nic up and down repeatedly.
>>> 
>>> [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
>>> [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
>>> [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>> [  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
>>> 88057dd9c000
>>> [  368.106005] RIP: 0010:[]  [] 
>>> fib_table_lookup+0x14/0x390
>>> [  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
>>> [  368.106005] RAX: 0001 RBX: 020155c0 RCX: 
>>> 0001
>>> [  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 
>>> 880036d11a00
>>> [  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
>>> 
>>> [  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 
>>> 88061fc83c58
>>> [  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 
>>> 020155c0
>>> [  368.106005] FS:  () GS:88061fc8() 
>>> knlGS:
>>> [  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
>>> [  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 
>>> 000407e0
>>> [  368.106005] DR0:  DR1:  DR2: 
>>> 
>>> [  368.106005] DR3:  DR6: 0ff0 DR7: 
>>> 0400
>>> [  368.106005] Stack:
>>> [  368.106005]  01c0 88057b766000 8802e380b000 
>>> 88057af03e00
>>> [  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
>>> 814ee146
>>> [  368.106005]  8802e380af00 e380af00 819e0900 
>>> 020155c001c0
>>> [  368.106005] Call Trace:
>>> [  368.106005]  
>>> [  368.106005]
>>> [  368.106005]  [] ip_route_input_noref+0x516/0xbd0
>>> [  368.106005]  [] ? skb_release_data+0xd6/0x110
>>> [  368.106005]  [] ? kfree_skb+0x3a/0xa0
>>> [  368.106005]  [] ip_rcv_finish+0x29f/0x350
>>> [  368.106005]  [] ip_rcv+0x234/0x380
>>> [  368.106005]  [] 
>>> __netif_receive_skb_core+0x676/0x870
>>> [  368.106005]  [] __netif_receive_skb+0x18/0x60
>>> [  368.106005]  [] process_backlog+0xae/0x180
>>> [  368.106005]  [] net_rx_action+0x152/0x240
>>> [  368.106005]  [] __do_softirq+0xef/0x280
>>> [  368.106005]  [] call_softirq+0x1c/0x30
>>> [  

Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-06-16 Thread Paul E. McKenney
On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
> On 2016/6/15 23:49, Paul E. McKenney wrote:
> > On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
> >> I met this problem when using the Testgine to send package to ixgbevf nic
> >> by this steps:
> >> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
> >> 2. Then use ifconfig to down the nic and up again, loop for several times.
> >> 3. The system panic by soft lockup.
> > 
> > Good catch, queued for review and testing.  But what .config was your
> > kernel built with?
> > 
> 
> I use the redhat7.1 defconfig to build my kernel, and the RCU config is this:
>  120 #
>  121 # RCU Subsystem
>  122 #
>  123 CONFIG_TREE_RCU=y
>  124 # CONFIG_PREEMPT_RCU is not set
>  125 CONFIG_RCU_STALL_COMMON=y
>  126 CONFIG_CONTEXT_TRACKING=y
>  127 CONFIG_RCU_USER_QS=y
>  128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
>  129 CONFIG_RCU_FANOUT=64
>  130 CONFIG_RCU_FANOUT_LEAF=16
>  131 # CONFIG_RCU_FANOUT_EXACT is not set
>  132 # CONFIG_RCU_FAST_NO_HZ is not set
>  133 # CONFIG_TREE_RCU_TRACE is not set
>  134 CONFIG_RCU_NOCB_CPU=y
>  135 CONFIG_RCU_NOCB_CPU_ALL=y
>  136 CONFIG_BUILD_BIN2C=y

Thank you!  You were running with preemption disabled, so your system
would indeed be very susceptible to this problem.

> > Also, I did tweak both the commit log and the patch.  Your cond_resched()
> > would prevent soft lockups, but not RCU stalls, so I substituted
> > cond_resched_rcu_qs().  Please let me know if either of those changes
> > causes problems at your end.
> 
> Looks fine to me, I will apply this to my branch and test it, thanks.

Please let me know how it goes!

Thanx, Paul

> Ding
> 
> > 
> > Thanx, Paul
> > 
> > 
> > 
> > commit c317cf19b34c0d2787b787c38bd2c8fe433215da
> > Author: Ding Tianhong 
> > Date:   Wed Jun 15 15:27:36 2016 +0800
> > 
> > rcu: Fix soft lockup for rcu_nocb_kthread
> > 
> > Carrying out the following steps results in a softlockup in the
> > RCU callback-offload (rcuo) kthreads:
> > 
> > 1. Connect to ixgbevf, and set the speed to 10Gb/s.
> > 2. Use ifconfig to bring the nic up and down repeatedly.
> > 
> > [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> > [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
> > [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
> > 88057dd9c000
> > [  368.106005] RIP: 0010:[]  [] 
> > fib_table_lookup+0x14/0x390
> > [  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
> > [  368.106005] RAX: 0001 RBX: 020155c0 RCX: 
> > 0001
> > [  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 
> > 880036d11a00
> > [  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
> > 
> > [  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 
> > 88061fc83c58
> > [  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 
> > 020155c0
> > [  368.106005] FS:  () GS:88061fc8() 
> > knlGS:
> > [  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
> > [  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 
> > 000407e0
> > [  368.106005] DR0:  DR1:  DR2: 
> > 
> > [  368.106005] DR3:  DR6: 0ff0 DR7: 
> > 0400
> > [  368.106005] Stack:
> > [  368.106005]  01c0 88057b766000 8802e380b000 
> > 88057af03e00
> > [  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
> > 814ee146
> > [  368.106005]  8802e380af00 e380af00 819e0900 
> > 020155c001c0
> > [  368.106005] Call Trace:
> > [  368.106005]  
> > [  368.106005]
> > [  368.106005]  [] ip_route_input_noref+0x516/0xbd0
> > [  368.106005]  [] ? skb_release_data+0xd6/0x110
> > [  368.106005]  [] ? kfree_skb+0x3a/0xa0
> > [  368.106005]  [] ip_rcv_finish+0x29f/0x350
> > [  368.106005]  [] ip_rcv+0x234/0x380
> > [  368.106005]  [] 
> > __netif_receive_skb_core+0x676/0x870
> > [  368.106005]  [] __netif_receive_skb+0x18/0x60
> > [  368.106005]  [] process_backlog+0xae/0x180
> > [  368.106005]  [] net_rx_action+0x152/0x240
> > [  368.106005]  [] __do_softirq+0xef/0x280
> > [  368.106005]  [] call_softirq+0x1c/0x30
> > [  368.106005]  
> > [  368.106005]
> > [  368.106005]  [] do_softirq+0x65/0xa0
> > [  368.106005]  [] local_bh_enable+0x94/0xa0
> > [  368.106005]  [] rcu_nocb_kthread+0x232/0x370
> >

Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-06-16 Thread Paul E. McKenney
On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote:
> On 2016/6/15 23:49, Paul E. McKenney wrote:
> > On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
> >> I met this problem when using the Testgine to send package to ixgbevf nic
> >> by this steps:
> >> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
> >> 2. Then use ifconfig to down the nic and up again, loop for several times.
> >> 3. The system panic by soft lockup.
> > 
> > Good catch, queued for review and testing.  But what .config was your
> > kernel built with?
> > 
> 
> I use the redhat7.1 defconfig to build my kernel, and the RCU config is this:
>  120 #
>  121 # RCU Subsystem
>  122 #
>  123 CONFIG_TREE_RCU=y
>  124 # CONFIG_PREEMPT_RCU is not set
>  125 CONFIG_RCU_STALL_COMMON=y
>  126 CONFIG_CONTEXT_TRACKING=y
>  127 CONFIG_RCU_USER_QS=y
>  128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
>  129 CONFIG_RCU_FANOUT=64
>  130 CONFIG_RCU_FANOUT_LEAF=16
>  131 # CONFIG_RCU_FANOUT_EXACT is not set
>  132 # CONFIG_RCU_FAST_NO_HZ is not set
>  133 # CONFIG_TREE_RCU_TRACE is not set
>  134 CONFIG_RCU_NOCB_CPU=y
>  135 CONFIG_RCU_NOCB_CPU_ALL=y
>  136 CONFIG_BUILD_BIN2C=y

Thank you!  You were running with preemption disabled, so your system
would indeed be very susceptible to this problem.

> > Also, I did tweak both the commit log and the patch.  Your cond_resched()
> > would prevent soft lockups, but not RCU stalls, so I substituted
> > cond_resched_rcu_qs().  Please let me know if either of those changes
> > causes problems at your end.
> 
> Looks fine to me, I will apply this to my branch and test it, thanks.

Please let me know how it goes!

Thanx, Paul

> Ding
> 
> > 
> > Thanx, Paul
> > 
> > 
> > 
> > commit c317cf19b34c0d2787b787c38bd2c8fe433215da
> > Author: Ding Tianhong 
> > Date:   Wed Jun 15 15:27:36 2016 +0800
> > 
> > rcu: Fix soft lockup for rcu_nocb_kthread
> > 
> > Carrying out the following steps results in a softlockup in the
> > RCU callback-offload (rcuo) kthreads:
> > 
> > 1. Connect to ixgbevf, and set the speed to 10Gb/s.
> > 2. Use ifconfig to bring the nic up and down repeatedly.
> > 
> > [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> > [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
> > [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
> > 88057dd9c000
> > [  368.106005] RIP: 0010:[]  [] 
> > fib_table_lookup+0x14/0x390
> > [  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
> > [  368.106005] RAX: 0001 RBX: 020155c0 RCX: 
> > 0001
> > [  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 
> > 880036d11a00
> > [  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
> > 
> > [  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 
> > 88061fc83c58
> > [  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 
> > 020155c0
> > [  368.106005] FS:  () GS:88061fc8() 
> > knlGS:
> > [  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
> > [  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 
> > 000407e0
> > [  368.106005] DR0:  DR1:  DR2: 
> > 
> > [  368.106005] DR3:  DR6: 0ff0 DR7: 
> > 0400
> > [  368.106005] Stack:
> > [  368.106005]  01c0 88057b766000 8802e380b000 
> > 88057af03e00
> > [  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
> > 814ee146
> > [  368.106005]  8802e380af00 e380af00 819e0900 
> > 020155c001c0
> > [  368.106005] Call Trace:
> > [  368.106005]  
> > [  368.106005]
> > [  368.106005]  [] ip_route_input_noref+0x516/0xbd0
> > [  368.106005]  [] ? skb_release_data+0xd6/0x110
> > [  368.106005]  [] ? kfree_skb+0x3a/0xa0
> > [  368.106005]  [] ip_rcv_finish+0x29f/0x350
> > [  368.106005]  [] ip_rcv+0x234/0x380
> > [  368.106005]  [] 
> > __netif_receive_skb_core+0x676/0x870
> > [  368.106005]  [] __netif_receive_skb+0x18/0x60
> > [  368.106005]  [] process_backlog+0xae/0x180
> > [  368.106005]  [] net_rx_action+0x152/0x240
> > [  368.106005]  [] __do_softirq+0xef/0x280
> > [  368.106005]  [] call_softirq+0x1c/0x30
> > [  368.106005]  
> > [  368.106005]
> > [  368.106005]  [] do_softirq+0x65/0xa0
> > [  368.106005]  [] local_bh_enable+0x94/0xa0
> > [  368.106005]  [] rcu_nocb_kthread+0x232/0x370
> > [  368.106005]  [] ? 

Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-06-16 Thread Ding Tianhong
On 2016/6/15 23:49, Paul E. McKenney wrote:
> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
>> I met this problem when using the Testgine to send package to ixgbevf nic
>> by this steps:
>> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
>> 2. Then use ifconfig to down the nic and up again, loop for several times.
>> 3. The system panic by soft lockup.
> 
> Good catch, queued for review and testing.  But what .config was your
> kernel built with?
> 

I use the redhat7.1 defconfig to build my kernel, and the RCU config is this:
 120 #
 121 # RCU Subsystem
 122 #
 123 CONFIG_TREE_RCU=y
 124 # CONFIG_PREEMPT_RCU is not set
 125 CONFIG_RCU_STALL_COMMON=y
 126 CONFIG_CONTEXT_TRACKING=y
 127 CONFIG_RCU_USER_QS=y
 128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
 129 CONFIG_RCU_FANOUT=64
 130 CONFIG_RCU_FANOUT_LEAF=16
 131 # CONFIG_RCU_FANOUT_EXACT is not set
 132 # CONFIG_RCU_FAST_NO_HZ is not set
 133 # CONFIG_TREE_RCU_TRACE is not set
 134 CONFIG_RCU_NOCB_CPU=y
 135 CONFIG_RCU_NOCB_CPU_ALL=y
 136 CONFIG_BUILD_BIN2C=y


> Also, I did tweak both the commit log and the patch.  Your cond_resched()
> would prevent soft lockups, but not RCU stalls, so I substituted
> cond_resched_rcu_qs().  Please let me know if either of those changes
> causes problems at your end.

Looks fine to me, I will apply this to my branch and test it, thanks.

Ding

> 
>   Thanx, Paul
> 
> 
> 
> commit c317cf19b34c0d2787b787c38bd2c8fe433215da
> Author: Ding Tianhong 
> Date:   Wed Jun 15 15:27:36 2016 +0800
> 
> rcu: Fix soft lockup for rcu_nocb_kthread
> 
> Carrying out the following steps results in a softlockup in the
> RCU callback-offload (rcuo) kthreads:
> 
> 1. Connect to ixgbevf, and set the speed to 10Gb/s.
> 2. Use ifconfig to bring the nic up and down repeatedly.
> 
> [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
> [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
> 88057dd9c000
> [  368.106005] RIP: 0010:[]  [] 
> fib_table_lookup+0x14/0x390
> [  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
> [  368.106005] RAX: 0001 RBX: 020155c0 RCX: 
> 0001
> [  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 
> 880036d11a00
> [  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
> 
> [  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 
> 88061fc83c58
> [  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 
> 020155c0
> [  368.106005] FS:  () GS:88061fc8() 
> knlGS:
> [  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
> [  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 
> 000407e0
> [  368.106005] DR0:  DR1:  DR2: 
> 
> [  368.106005] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  368.106005] Stack:
> [  368.106005]  01c0 88057b766000 8802e380b000 
> 88057af03e00
> [  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
> 814ee146
> [  368.106005]  8802e380af00 e380af00 819e0900 
> 020155c001c0
> [  368.106005] Call Trace:
> [  368.106005]  
> [  368.106005]
> [  368.106005]  [] ip_route_input_noref+0x516/0xbd0
> [  368.106005]  [] ? skb_release_data+0xd6/0x110
> [  368.106005]  [] ? kfree_skb+0x3a/0xa0
> [  368.106005]  [] ip_rcv_finish+0x29f/0x350
> [  368.106005]  [] ip_rcv+0x234/0x380
> [  368.106005]  [] __netif_receive_skb_core+0x676/0x870
> [  368.106005]  [] __netif_receive_skb+0x18/0x60
> [  368.106005]  [] process_backlog+0xae/0x180
> [  368.106005]  [] net_rx_action+0x152/0x240
> [  368.106005]  [] __do_softirq+0xef/0x280
> [  368.106005]  [] call_softirq+0x1c/0x30
> [  368.106005]  
> [  368.106005]
> [  368.106005]  [] do_softirq+0x65/0xa0
> [  368.106005]  [] local_bh_enable+0x94/0xa0
> [  368.106005]  [] rcu_nocb_kthread+0x232/0x370
> [  368.106005]  [] ? wake_up_bit+0x30/0x30
> [  368.106005]  [] ? rcu_start_gp+0x40/0x40
> [  368.106005]  [] kthread+0xcf/0xe0
> [  368.106005]  [] ? kthread_create_on_node+0x140/0x140
> [  368.106005]  [] ret_from_fork+0x58/0x90
> [  368.106005]  [] ? kthread_create_on_node+0x140/0x140
> 
> ==cut here==
> 
> It turns out that the rcuos callback-offload kthread is busy processing
> a very large 

Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-06-16 Thread Ding Tianhong
On 2016/6/15 23:49, Paul E. McKenney wrote:
> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
>> I met this problem when using the Testgine to send package to ixgbevf nic
>> by this steps:
>> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
>> 2. Then use ifconfig to down the nic and up again, loop for several times.
>> 3. The system panic by soft lockup.
> 
> Good catch, queued for review and testing.  But what .config was your
> kernel built with?
> 

I use the redhat7.1 defconfig to build my kernel, and the RCU config is this:
 120 #
 121 # RCU Subsystem
 122 #
 123 CONFIG_TREE_RCU=y
 124 # CONFIG_PREEMPT_RCU is not set
 125 CONFIG_RCU_STALL_COMMON=y
 126 CONFIG_CONTEXT_TRACKING=y
 127 CONFIG_RCU_USER_QS=y
 128 # CONFIG_CONTEXT_TRACKING_FORCE is not set
 129 CONFIG_RCU_FANOUT=64
 130 CONFIG_RCU_FANOUT_LEAF=16
 131 # CONFIG_RCU_FANOUT_EXACT is not set
 132 # CONFIG_RCU_FAST_NO_HZ is not set
 133 # CONFIG_TREE_RCU_TRACE is not set
 134 CONFIG_RCU_NOCB_CPU=y
 135 CONFIG_RCU_NOCB_CPU_ALL=y
 136 CONFIG_BUILD_BIN2C=y


> Also, I did tweak both the commit log and the patch.  Your cond_resched()
> would prevent soft lockups, but not RCU stalls, so I substituted
> cond_resched_rcu_qs().  Please let me know if either of those changes
> causes problems at your end.

Looks fine to me, I will apply this to my branch and test it, thanks.

Ding

> 
>   Thanx, Paul
> 
> 
> 
> commit c317cf19b34c0d2787b787c38bd2c8fe433215da
> Author: Ding Tianhong 
> Date:   Wed Jun 15 15:27:36 2016 +0800
> 
> rcu: Fix soft lockup for rcu_nocb_kthread
> 
> Carrying out the following steps results in a softlockup in the
> RCU callback-offload (rcuo) kthreads:
> 
> 1. Connect to ixgbevf, and set the speed to 10Gb/s.
> 2. Use ifconfig to bring the nic up and down repeatedly.
> 
> [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
> [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
> 88057dd9c000
> [  368.106005] RIP: 0010:[]  [] 
> fib_table_lookup+0x14/0x390
> [  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
> [  368.106005] RAX: 0001 RBX: 020155c0 RCX: 
> 0001
> [  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 
> 880036d11a00
> [  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
> 
> [  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 
> 88061fc83c58
> [  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 
> 020155c0
> [  368.106005] FS:  () GS:88061fc8() 
> knlGS:
> [  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
> [  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 
> 000407e0
> [  368.106005] DR0:  DR1:  DR2: 
> 
> [  368.106005] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  368.106005] Stack:
> [  368.106005]  01c0 88057b766000 8802e380b000 
> 88057af03e00
> [  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
> 814ee146
> [  368.106005]  8802e380af00 e380af00 819e0900 
> 020155c001c0
> [  368.106005] Call Trace:
> [  368.106005]  
> [  368.106005]
> [  368.106005]  [] ip_route_input_noref+0x516/0xbd0
> [  368.106005]  [] ? skb_release_data+0xd6/0x110
> [  368.106005]  [] ? kfree_skb+0x3a/0xa0
> [  368.106005]  [] ip_rcv_finish+0x29f/0x350
> [  368.106005]  [] ip_rcv+0x234/0x380
> [  368.106005]  [] __netif_receive_skb_core+0x676/0x870
> [  368.106005]  [] __netif_receive_skb+0x18/0x60
> [  368.106005]  [] process_backlog+0xae/0x180
> [  368.106005]  [] net_rx_action+0x152/0x240
> [  368.106005]  [] __do_softirq+0xef/0x280
> [  368.106005]  [] call_softirq+0x1c/0x30
> [  368.106005]  
> [  368.106005]
> [  368.106005]  [] do_softirq+0x65/0xa0
> [  368.106005]  [] local_bh_enable+0x94/0xa0
> [  368.106005]  [] rcu_nocb_kthread+0x232/0x370
> [  368.106005]  [] ? wake_up_bit+0x30/0x30
> [  368.106005]  [] ? rcu_start_gp+0x40/0x40
> [  368.106005]  [] kthread+0xcf/0xe0
> [  368.106005]  [] ? kthread_create_on_node+0x140/0x140
> [  368.106005]  [] ret_from_fork+0x58/0x90
> [  368.106005]  [] ? kthread_create_on_node+0x140/0x140
> 
> ==cut here==
> 
> It turns out that the rcuos callback-offload kthread is busy processing
> a very large quantity of RCU callbacks, and 

Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-06-15 Thread Paul E. McKenney
On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
> I met this problem when using the Testgine to send package to ixgbevf nic
> by this steps:
> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
> 2. Then use ifconfig to down the nic and up again, loop for several times.
> 3. The system panic by soft lockup.

Good catch, queued for review and testing.  But what .config was your
kernel built with?

Also, I did tweak both the commit log and the patch.  Your cond_resched()
would prevent soft lockups, but not RCU stalls, so I substituted
cond_resched_rcu_qs().  Please let me know if either of those changes
causes problems at your end.

Thanx, Paul



commit c317cf19b34c0d2787b787c38bd2c8fe433215da
Author: Ding Tianhong 
Date:   Wed Jun 15 15:27:36 2016 +0800

rcu: Fix soft lockup for rcu_nocb_kthread

Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
88057dd9c000
[  368.106005] RIP: 0010:[]  [] 
fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
[  368.106005] RAX: 0001 RBX: 020155c0 RCX: 
0001
[  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 
880036d11a00
[  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 

[  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 
88061fc83c58
[  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 
020155c0
[  368.106005] FS:  () GS:88061fc8() 
knlGS:
[  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
[  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 
000407e0
[  368.106005] DR0:  DR1:  DR2: 

[  368.106005] DR3:  DR6: 0ff0 DR7: 
0400
[  368.106005] Stack:
[  368.106005]  01c0 88057b766000 8802e380b000 
88057af03e00
[  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
814ee146
[  368.106005]  8802e380af00 e380af00 819e0900 
020155c001c0
[  368.106005] Call Trace:
[  368.106005]  
[  368.106005]
[  368.106005]  [] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [] ? skb_release_data+0xd6/0x110
[  368.106005]  [] ? kfree_skb+0x3a/0xa0
[  368.106005]  [] ip_rcv_finish+0x29f/0x350
[  368.106005]  [] ip_rcv+0x234/0x380
[  368.106005]  [] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [] __netif_receive_skb+0x18/0x60
[  368.106005]  [] process_backlog+0xae/0x180
[  368.106005]  [] net_rx_action+0x152/0x240
[  368.106005]  [] __do_softirq+0xef/0x280
[  368.106005]  [] call_softirq+0x1c/0x30
[  368.106005]  
[  368.106005]
[  368.106005]  [] do_softirq+0x65/0xa0
[  368.106005]  [] local_bh_enable+0x94/0xa0
[  368.106005]  [] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [] ? wake_up_bit+0x30/0x30
[  368.106005]  [] ? rcu_start_gp+0x40/0x40
[  368.106005]  [] kthread+0xcf/0xe0
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [] ret_from_fork+0x58/0x90
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140

==cut here==

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

Signed-off-by: Ding Tianhong 
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 0082fce402a0..85c5a883c6e3 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
cl++;
c++;
local_bh_enable();
+   cond_resched_rcu_qs();
list = next;
}
trace_rcu_batch_end(rdp->rsp->name, c, !!list, 

Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-06-15 Thread Paul E. McKenney
On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote:
> I met this problem when using the Testgine to send package to ixgbevf nic
> by this steps:
> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
> 2. Then use ifconfig to down the nic and up again, loop for several times.
> 3. The system panic by soft lockup.

Good catch, queued for review and testing.  But what .config was your
kernel built with?

Also, I did tweak both the commit log and the patch.  Your cond_resched()
would prevent soft lockups, but not RCU stalls, so I substituted
cond_resched_rcu_qs().  Please let me know if either of those changes
causes problems at your end.

Thanx, Paul



commit c317cf19b34c0d2787b787c38bd2c8fe433215da
Author: Ding Tianhong 
Date:   Wed Jun 15 15:27:36 2016 +0800

rcu: Fix soft lockup for rcu_nocb_kthread

Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
88057dd9c000
[  368.106005] RIP: 0010:[]  [] 
fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
[  368.106005] RAX: 0001 RBX: 020155c0 RCX: 
0001
[  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 
880036d11a00
[  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 

[  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 
88061fc83c58
[  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 
020155c0
[  368.106005] FS:  () GS:88061fc8() 
knlGS:
[  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
[  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 
000407e0
[  368.106005] DR0:  DR1:  DR2: 

[  368.106005] DR3:  DR6: 0ff0 DR7: 
0400
[  368.106005] Stack:
[  368.106005]  01c0 88057b766000 8802e380b000 
88057af03e00
[  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
814ee146
[  368.106005]  8802e380af00 e380af00 819e0900 
020155c001c0
[  368.106005] Call Trace:
[  368.106005]  
[  368.106005]
[  368.106005]  [] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [] ? skb_release_data+0xd6/0x110
[  368.106005]  [] ? kfree_skb+0x3a/0xa0
[  368.106005]  [] ip_rcv_finish+0x29f/0x350
[  368.106005]  [] ip_rcv+0x234/0x380
[  368.106005]  [] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [] __netif_receive_skb+0x18/0x60
[  368.106005]  [] process_backlog+0xae/0x180
[  368.106005]  [] net_rx_action+0x152/0x240
[  368.106005]  [] __do_softirq+0xef/0x280
[  368.106005]  [] call_softirq+0x1c/0x30
[  368.106005]  
[  368.106005]
[  368.106005]  [] do_softirq+0x65/0xa0
[  368.106005]  [] local_bh_enable+0x94/0xa0
[  368.106005]  [] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [] ? wake_up_bit+0x30/0x30
[  368.106005]  [] ? rcu_start_gp+0x40/0x40
[  368.106005]  [] kthread+0xcf/0xe0
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [] ret_from_fork+0x58/0x90
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140

==cut here==

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

Signed-off-by: Ding Tianhong 
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 0082fce402a0..85c5a883c6e3 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
cl++;
c++;
local_bh_enable();
+   cond_resched_rcu_qs();
list = next;
}
trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);



[PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-06-15 Thread Ding Tianhong
I met this problem when using the Testgine to send package to ixgbevf nic
by this steps:
1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
2. Then use ifconfig to down the nic and up again, loop for several times.
3. The system panic by soft lockup.
[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
88057dd9c000
[  368.106005] RIP: 0010:[]  [] 
fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
[  368.106005] RAX: 0001 RBX: 020155c0 RCX: 0001
[  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 880036d11a00
[  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
[  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 88061fc83c58
[  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 020155c0
[  368.106005] FS:  () GS:88061fc8() 
knlGS:
[  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
[  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 000407e0
[  368.106005] DR0:  DR1:  DR2: 
[  368.106005] DR3:  DR6: 0ff0 DR7: 0400
[  368.106005] Stack:
[  368.106005]  01c0 88057b766000 8802e380b000 
88057af03e00
[  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
814ee146
[  368.106005]  8802e380af00 e380af00 819e0900 
020155c001c0
[  368.106005] Call Trace:
[  368.106005]  
[  368.106005]
[  368.106005]  [] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [] ? skb_release_data+0xd6/0x110
[  368.106005]  [] ? kfree_skb+0x3a/0xa0
[  368.106005]  [] ip_rcv_finish+0x29f/0x350
[  368.106005]  [] ip_rcv+0x234/0x380
[  368.106005]  [] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [] __netif_receive_skb+0x18/0x60
[  368.106005]  [] process_backlog+0xae/0x180
[  368.106005]  [] net_rx_action+0x152/0x240
[  368.106005]  [] __do_softirq+0xef/0x280
[  368.106005]  [] call_softirq+0x1c/0x30
[  368.106005]  
[  368.106005]
[  368.106005]  [] do_softirq+0x65/0xa0
[  368.106005]  [] local_bh_enable+0x94/0xa0
[  368.106005]  [] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [] ? wake_up_bit+0x30/0x30
[  368.106005]  [] ? rcu_start_gp+0x40/0x40
[  368.106005]  [] kthread+0xcf/0xe0
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [] ret_from_fork+0x58/0x90
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140

==cut here==

Then I check the rcuos thread rcu_nocb_kthread, it will invokes callbacks queued
by the corresponding no-CBs cpu, in the loops, it will disable the local irq and
enable again, in the local_bh_enable() it will call do_softirq() to deal the 
package
in the recv queue, it looks takes long time, so add cont_sched to feed the
watchdog to fix the problem.

Signed-off-by: Ding Tianhong 
---
 kernel/rcu/tree_plugin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index ff1cd4e..1bc729a 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2262,6 +2262,7 @@ static int rcu_nocb_kthread(void *arg)
cl++;
c++;
local_bh_enable();
+   cond_resched();
list = next;
}
trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
-- 
1.9.0




[PATCH] rcu: Fix soft lockup for rcu_nocb_kthread

2016-06-15 Thread Ding Tianhong
I met this problem when using the Testgine to send package to ixgbevf nic
by this steps:
1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine.
2. Then use ifconfig to down the nic and up again, loop for several times.
3. The system panic by soft lockup.
[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
88057dd9c000
[  368.106005] RIP: 0010:[]  [] 
fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
[  368.106005] RAX: 0001 RBX: 020155c0 RCX: 0001
[  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 880036d11a00
[  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
[  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 88061fc83c58
[  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 020155c0
[  368.106005] FS:  () GS:88061fc8() 
knlGS:
[  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
[  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 000407e0
[  368.106005] DR0:  DR1:  DR2: 
[  368.106005] DR3:  DR6: 0ff0 DR7: 0400
[  368.106005] Stack:
[  368.106005]  01c0 88057b766000 8802e380b000 
88057af03e00
[  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
814ee146
[  368.106005]  8802e380af00 e380af00 819e0900 
020155c001c0
[  368.106005] Call Trace:
[  368.106005]  
[  368.106005]
[  368.106005]  [] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [] ? skb_release_data+0xd6/0x110
[  368.106005]  [] ? kfree_skb+0x3a/0xa0
[  368.106005]  [] ip_rcv_finish+0x29f/0x350
[  368.106005]  [] ip_rcv+0x234/0x380
[  368.106005]  [] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [] __netif_receive_skb+0x18/0x60
[  368.106005]  [] process_backlog+0xae/0x180
[  368.106005]  [] net_rx_action+0x152/0x240
[  368.106005]  [] __do_softirq+0xef/0x280
[  368.106005]  [] call_softirq+0x1c/0x30
[  368.106005]  
[  368.106005]
[  368.106005]  [] do_softirq+0x65/0xa0
[  368.106005]  [] local_bh_enable+0x94/0xa0
[  368.106005]  [] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [] ? wake_up_bit+0x30/0x30
[  368.106005]  [] ? rcu_start_gp+0x40/0x40
[  368.106005]  [] kthread+0xcf/0xe0
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [] ret_from_fork+0x58/0x90
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140

==cut here==

Then I check the rcuos thread rcu_nocb_kthread, it will invokes callbacks queued
by the corresponding no-CBs cpu, in the loops, it will disable the local irq and
enable again, in the local_bh_enable() it will call do_softirq() to deal the 
package
in the recv queue, it looks takes long time, so add cont_sched to feed the
watchdog to fix the problem.

Signed-off-by: Ding Tianhong 
---
 kernel/rcu/tree_plugin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index ff1cd4e..1bc729a 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2262,6 +2262,7 @@ static int rcu_nocb_kthread(void *arg)
cl++;
c++;
local_bh_enable();
+   cond_resched();
list = next;
}
trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
-- 
1.9.0