On Mon, Jun 6, 2011 at 12:21 PM, Mathieu Desnoyers <[email protected]> wrote: > * Mathieu Desnoyers ([email protected]) wrote: >> I notice that the "poll(NULL, 0, 10);" delay is executed both for the RT >> and non-RT code. So given that my goal is to get the call_rcu thread to >> GC memory as quickly as possible to diminish the overhead of cache >> misses, I decided to try removing this delay for !RT: the call_rcu >> thread then wakes up ASAP when the thread invoking call_rcu wakes it. My >> updates jump to 76349/s (getting there!) ;). >> >> This improvement can be explained by a lower delay between call_rcu and >> execution of its callback, which decrease the amount of cache used, and >> therefore provides better cache locality. > > I just wonder if it's worth it: removing this delay from the !RT > call_rcu thread can cause high-rate of synchronize_rcu() calls. So > although there might be an advantage in terms of update rate, it will > likely cause extra cache-line bounces between the call_rcu threads and > the reader threads. > > test_urcu_rbtree 7 1 20 -g 1000000 > > With the delay in the call_rcu thread: > search: 1842857 items/reader thread/s (7 reader threads) > updates: 21066 items/s (1 update thread) > ratio: 87 search/update > > Without the delay in the call_rcu thread: > search: 3064285 items/reader thread/s (7 reader threads) > updates: 45096 items/s (1 update thread) > ratio: 68 search/update > > So basically, adding the delay doubles the update performance, at the > cost of being 33% slower for reads. My first thought is that if an > application has very frequent updates, then maybe it wants to have fast > updates because the update throughput is then important. If the > application has infrequent updates, then the reads will be fast anyway, > because rare call_rcu invocation will trigger less cache-line bounce > between readers and writers. Any other thoughts on this trade-off and > how to deal with it ? >
Did I miss something here? It looks like you more than doubled the update rate and almost doubled the lookup rate. The search/update ration is less, but if both the raw rates improved so much, how is this a bad thing? -phil > Thanks, > > Mathieu > > >> >> Signed-off-by: Mathieu Desnoyers <[email protected]> >> --- >> urcu-call-rcu-impl.h | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> Index: userspace-rcu/urcu-call-rcu-impl.h >> =================================================================== >> --- userspace-rcu.orig/urcu-call-rcu-impl.h >> +++ userspace-rcu/urcu-call-rcu-impl.h >> @@ -242,7 +242,8 @@ static void *call_rcu_thread(void *arg) >> else { >> if (&crdp->cbs.head == >> _CMM_LOAD_SHARED(crdp->cbs.tail)) >> call_rcu_wait(crdp); >> - poll(NULL, 0, 10); >> + else >> + poll(NULL, 0, 10); >> } >> } >> call_rcu_lock(&crdp->mtx); >> > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > > _______________________________________________ > rp mailing list > [email protected] > http://svcs.cs.pdx.edu/mailman/listinfo/rp > _______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
