On Mon, Jun 6, 2011 at 12:41 PM, Paul E. McKenney <[email protected]> wrote: > On Mon, Jun 06, 2011 at 03:21:07PM -0400, Mathieu Desnoyers wrote: >> * Mathieu Desnoyers ([email protected]) wrote: >> > I notice that the "poll(NULL, 0, 10);" delay is executed both for the RT >> > and non-RT code. So given that my goal is to get the call_rcu thread to >> > GC memory as quickly as possible to diminish the overhead of cache >> > misses, I decided to try removing this delay for !RT: the call_rcu >> > thread then wakes up ASAP when the thread invoking call_rcu wakes it. My >> > updates jump to 76349/s (getting there!) ;). >> > >> > This improvement can be explained by a lower delay between call_rcu and >> > execution of its callback, which decrease the amount of cache used, and >> > therefore provides better cache locality. >> >> I just wonder if it's worth it: removing this delay from the !RT >> call_rcu thread can cause high-rate of synchronize_rcu() calls. So >> although there might be an advantage in terms of update rate, it will >> likely cause extra cache-line bounces between the call_rcu threads and >> the reader threads. >> >> test_urcu_rbtree 7 1 20 -g 1000000 >> >> With the delay in the call_rcu thread: >> search: 1842857 items/reader thread/s (7 reader threads) >> updates: 21066 items/s (1 update thread) >> ratio: 87 search/update >> >> Without the delay in the call_rcu thread: >> search: 3064285 items/reader thread/s (7 reader threads) >> updates: 45096 items/s (1 update thread) >> ratio: 68 search/update >> >> So basically, adding the delay doubles the update performance, at the >> cost of being 33% slower for reads. My first thought is that if an >> application has very frequent updates, then maybe it wants to have fast >> updates because the update throughput is then important. If the >> application has infrequent updates, then the reads will be fast anyway, >> because rare call_rcu invocation will trigger less cache-line bounce >> between readers and writers. Any other thoughts on this trade-off and >> how to deal with it ? > > One approach would be to let the user handle it using real-time > priority adjustment. Another approach would be to let the user > specify the wait time in milliseconds, and skip the poll() system > call if the specified wait time is zero. > > The latter seems more sane to me. It also allows the user to > specify (say) 10000 milliseconds for cases where there is a > lot of memory and where amortizing synchronize_rcu() overhead > across a large number of updates is important. > > Other thoughts? > > Thanx, Paul
If synchronize_rcu is used to time memory reclamation, then trading memory for overhead is a valid way to think of this timing. But if synchronize_rcu is required inside an update for other purposes (e.g. my RBTree algorithm or Josh's hash table resize), then the trade-off needs to include synchronize_rcu overhead vs. update throughput. -phil > >> Thanks, >> >> Mathieu >> >> >> > >> > Signed-off-by: Mathieu Desnoyers <[email protected]> >> > --- >> > urcu-call-rcu-impl.h | 3 ++- >> > 1 file changed, 2 insertions(+), 1 deletion(-) >> > >> > Index: userspace-rcu/urcu-call-rcu-impl.h >> > =================================================================== >> > --- userspace-rcu.orig/urcu-call-rcu-impl.h >> > +++ userspace-rcu/urcu-call-rcu-impl.h >> > @@ -242,7 +242,8 @@ static void *call_rcu_thread(void *arg) >> > else { >> > if (&crdp->cbs.head == >> > _CMM_LOAD_SHARED(crdp->cbs.tail)) >> > call_rcu_wait(crdp); >> > - poll(NULL, 0, 10); >> > + else >> > + poll(NULL, 0, 10); >> > } >> > } >> > call_rcu_lock(&crdp->mtx); >> > >> >> -- >> Mathieu Desnoyers >> Operating System Efficiency R&D Consultant >> EfficiOS Inc. >> http://www.efficios.com > > _______________________________________________ > rp mailing list > [email protected] > http://svcs.cs.pdx.edu/mailman/listinfo/rp > _______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
