* Lai Jiangshan ([email protected]) wrote: > We need the smallest patch at first, all other things left in disscussion. > no free_each(), simple changes.
I guess you mean for_each(). If we limit ourself to the versions where the user is doing the locking, I don't think there were any issues left. In the version I provided (in the last series of 2 patches), I took care of all issues that were raised in our email discussion. I prefer to introduce these new API members all in one go, mainly because I really don't want to add new API members (exposed through the public API) and then remove them afterward when we notice that they expose too many details. I think the current splice, dequeue, next, and first API members allow any user to do the kind of use-case that call_rcu is doing: this lets us achieve your original goal of not duplicating the code everywhere. If you still notice issues with __cds_wfq_for_each_blocking() and __cds_wfq_for_each_blocking_safe() in the last patch, please let me know, Thanks ! Mathieu > > thanks, > Lai > > On Thu, Aug 16, 2012 at 10:08 AM, Lai Jiangshan <[email protected]> wrote: > >>> > >>> Is it false sharing? > >>> Access to q->head.next and access to q->tail have the same performance > >>> because they are in the same cache line. > >> > >> Yes! you are right! And a quick benchmark confirms it: > >> > >> with head and tail on same cache line: > >> > >> SUMMARY /home/compudj/doc/userspace-rcu/tests/.libs/lt-test_urcu_wfq > >> testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 > >> nr_enqueues 100833595 nr_dequeues 88647134 successful enqueues > >> 100833595 successful dequeues 88646898 end_dequeues 12186697 nr_ops > >> 189480729 > >> > >> with a 256 bytes padding between head and tail, keeping the mutex on the > >> "head" cache line: > >> > >> SUMMARY /home/compudj/doc/userspace-rcu/tests/.libs/lt-test_urcu_wfq > >> testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 > >> nr_enqueues 228992829 nr_dequeues 228921791 successful enqueues > >> 228992829 successful dequeues 228921367 end_dequeues 71462 nr_ops > >> 457914620 > >> > >> enqueue: 127% speedup > >> dequeue: 158% speedup > >> > >> That is indeed a _really_ huge difference. However, to get this, we > >> would have to increase the size of struct cds_wfq_queue beyond its > >> current size, which would break API compatibility. Any idea on how to > >> best do this without causing incompatibility would be welcome. > >> > > > > choice 1) two set of APIs?(cache-line-opt and none-cache-line-opt), > > many users don't need the cache-line-opt. > > choice 2) Just break the compatibility for NONE-LGPL. I think > > NONE-LGPL-user of it is rare. And current version of urcu <1.0, I > > don't like too much burden when <1.0. > > > > > > thanks, > > Lai -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
