* Lai Jiangshan ([email protected]) wrote: > On 10/11/2012 03:50 AM, Paul E. McKenney wrote: > > On Wed, Oct 10, 2012 at 01:53:04PM -0400, Mathieu Desnoyers wrote: > >> * Mathieu Desnoyers ([email protected]) wrote: > >>> * Paul E. McKenney ([email protected]) wrote: > >>>> On Wed, Oct 10, 2012 at 07:42:15AM -0400, Mathieu Desnoyers wrote: > >>>>> * Lai Jiangshan ([email protected]) wrote: > >>>>>> test code: > >>>>>> ./tests/test_urcu_lfs 100 10 10 > >>>>>> > >>>>>> bug produce rate > 60% > >>>>>> > >>>>>> {{{ > >>>>>> I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or > >>>>>> "./tests/test_urcu_lfs 100 100 10" > >>>>>> But I just test it about 5 times > >>>>>> }}} > >>>>>> > >>>>>> 4cores*1threads: Intel(R) Core(TM) i5 CPU 760 > >>>>>> RCU_MB (no time to test for other rcu type) > >>>>>> test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a > >>>>>> > >>>>>> I didn't see any bug when "./tests/test_urcu_mb 10 100 10" > >>>>>> > >>>>>> Sorry, I tried, but I failed to find out the root cause currently. > >>>>> > >>>>> I think I managed to narrow down the issue: > >>>>> > >>>>> 1) the master branch does not reproduce it, but commit > >>>>> 768fba83676f49eb73fd1d8ad452016a84c5ec2a repdroduces it about 50% of > >>>>> the > >>>>> time. > >>>>> > >>>>> 2) the main change between 768fba83676f49eb73fd1d8ad452016a84c5ec2a and > >>>>> current master (f94061a3df4c9eab9ac869a19e4228de54771fcb) is call_rcu > >>>>> moving to wfcqueue. > >>>>> > >>>>> 3) the bug always arise, for me, at the end of the 10 seconds. > >>>>> However, it might be simply due to the fact that most of the memory > >>>>> get freed at the end of program execution. > >>>>> > >>>>> 4) I've been able to get a backtrace, and it looks like we have some > >>>>> call_rcu callback-invokation threads still working while > >>>>> call_rcu_data_free() is invoked. In the backtrace, > >>>>> call_rcu_data_free() > >>>>> is nicely waiting for the next thread to stop, and during that time, > >>>>> two callback-invokation threads are invoking callbacks (and one of > >>>>> them triggers the segfault). > >>>> > >>>> Do any of the callbacks reference __thread variables from some other > >>>> thread? If so, those threads must refrain from exiting until after > >>>> such callbacks complete. > >>> > >>> The callback is a simple caa_container_of + free, usual stuff, nothing > >>> fancy. > >> > >> Here is the fix: the bug was in call rcu. It is not required for master, > >> because we fixed it while moving to wfcqueue. > >> > >> We were erroneously writing to the head field of the default > >> call_rcu_data rather than tail. > > > > Ouch!!! I have no idea why that would have passed my testing. :-( > > It's one of the reasons that I rewrite wfqueue and introduce delete_all() > (Mathieu uses splice instead) to replace open code of wfqueue in > urcu-call-rcu-impl.h.
Yes, you did an excellent call on this one. We had the bug fixed in the master branch before it was discovered, which is always nice :) > > > > >> I wonder if we should simply do a new release with call_rcu using > >> wfcqueue and tell people to upgrade, or if we should somehow create a > >> stable branch with this fix. > >> > >> Thoughts ? > > > > Under what conditions does this bug appear? It is necessary to not just > > use call_rcu(), but also to explicitly call call_rcu_data_free(), right? > > > > My guess is that a stable branch would be good -- there will be other > > bugs, after all. :-/ > > > > Thanx, Paul > > > >> Thanks, > >> > >> Mathieu > >> > >> --- > >> diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h > >> index 13b24ff..b205229 100644 > >> --- a/urcu-call-rcu-impl.h > >> +++ b/urcu-call-rcu-impl.h > >> @@ -647,8 +647,9 @@ void call_rcu_data_free(struct call_rcu_data *crdp) > >> /* Create default call rcu data if need be */ > >> (void) get_default_call_rcu_data(); > >> cbs_endprev = (struct cds_wfq_node **) > >> - uatomic_xchg(&default_call_rcu_data, cbs_tail); > >> - *cbs_endprev = cbs; > >> + uatomic_xchg(&default_call_rcu_data->cbs.tail, > >> + cbs_tail); > >> + _CMM_STORE_SHARED(*cbs_endprev, cbs); > >> uatomic_add(&default_call_rcu_data->qlen, > >> uatomic_read(&crdp->qlen)); > >> wake_call_rcu_thread(default_call_rcu_data); > >> > >> > >>> > >>> Thanks, > >>> > >>> Mathieu > >>> > >>>> > >>>> Thanx, Paul > >>>> > >>>>> So I expect that commit > >>>>> > >>>>> commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe > >>>>> Author: Mathieu Desnoyers <[email protected]> > >>>>> Date: Tue Sep 25 10:50:49 2012 -0500 > >>>>> > >>>>> call_rcu: use wfcqueue, eliminate false-sharing > >>>>> > >>>>> Eliminate false-sharing between call_rcu (enqueuer) and worker > >>>>> threads > >>>>> on the queue head and tail. > >>>>> > > I think the changelog of this commit is too short. Yes, unfortunately, it's already in the master branch, and rewriting history is something mere mortals are not allowed to do. ;-) There are a couple of patches that I have pending that did not receive any negative feedback at this point. I will move things forwards cautiously to ensure we have a consistent master branch, and create stable branches for 0.6 and 0.7. Review of these commits will be welcome before we do a 0.8 release (with the new wfcqueue). We'll be able to proceed to further changes as separate commits. Thanks! Mathieu > > Thanks, > Lai -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
