On Fri, Jun 28, 2019 at 03:25:47PM -0700, Paul E. McKenney wrote: > On Fri, Jun 28, 2019 at 05:40:18PM -0400, Joel Fernandes wrote: > > Hi Paul, > > > > On Fri, Jun 28, 2019 at 01:04:23PM -0700, Paul E. McKenney wrote: > > [snip] > > > > > > Commit > > > > > > - 23634ebc1d946 ("rcu: Check for wakeup-safe conditions in > > > > > > rcu_read_unlock_special()") does not trigger the bug within 94 > > > > > > attempts. > > > > > > > > > > > > - 48d07c04b4cc1 ("rcu: Enable elimination of Tree-RCU softirq > > > > > > processing") needed 12 attempts to trigger the bug. > > > > > > > > > > That matches my belief that 23634ebc1d946 ("rcu: Check for wakeup-safe > > > > > conditions in rcu_read_unlock_special()") will at least greatly > > > > > decrease > > > > > the probability of this bug occurring. > > > > > > > > I was just typing a reply that I can't reproduce it with: > > > > rcu: Check for wakeup-safe conditions in rcu_read_unlock_special() > > > > > > > > I am trying to revert enough of this patch to see what would break > > > > things, > > > > however I think a better exercise might be to understand more what the > > > > patch > > > > does why it fixes things in the first place ;-) It is probably the > > > > deferred_qs thing. > > > > > > The deferred_qs flag is part of it! Looking forward to hearing what > > > you come up with as being the critical piece of this commit. > > > > The new deferred_qs flag indeed saves the machine from the dead-lock. > > > > If we don't want the deferred_qs, then the below patch also fixes the issue. > > However, I am more sure than not that it does not handle all cases (such as > > what if we previously had an expedited grace period IPI in a previous reader > > section and had to to defer processing. Then it seems a similar deadlock > > would present. But anyway, the below patch does fix it for me! It is based > > on > > your -rcu tree commit 23634ebc1d946f19eb112d4455c1d84948875e31 (rcu: Check > > for wakeup-safe conditions in rcu_read_unlock_special()). > > The point here being that you rely on .b.blocked rather than > .b.deferred_qs. Hmmm... There are a number of places that check all > the bits via the .s leg of the rcu_special union. The .s check in > rcu_preempt_need_deferred_qs() should be OK because it is conditioned > on t->rcu_read_lock_nesting of zero or negative. > Do rest of those also work out OK? > > It would be nice to remove the flag, but doing so clearly needs careful > review and testing.
Agreed. I am planning to do an audit of this code within the next couple of weeks so I will be on the look out for any optimization opportunities related to this. Will let you know if this can work. For now I like your patch better because it is more conservative and doesn't cause any space overhead. If you'd like, please free to included my Tested-by on it: Tested-by: Joel Fernandes (Google) <j...@joelfernandes.org> If you had a chance, could you also point to me any tests that show performance improvement with the irqwork patch, on the expedited GP usecase? I'd like to try it out as well. I guess rcuperf should have some? thanks! - Joel