When I ported the old lock API to the new one, I brought forth the code
for handling nested locks. In profiling the routines, I noticed that the
code for handling the nested locks adds huge overhead for something that
is not often used, necessary, or even desired. Here are the numbers I'm
seeing from the testlockperf test:

With (now):  1252330 usec
Without:      595473 usec   (#ifdef'd out)

Granted that this is an artificial test, it does give a good measurement
of the amount of _overhead_ used in the mutex calls. If we can remove more
than half of the overhead for the thread_mutex calls, we can potentially
reduce lock contention dramatically for heavily-loaded servers and hot
critical-paths in the code. I suspect that this will have a huge impact
on multiprocessor servers, where lock contention can effectively starve
other CPUs.

In many cases, the underlying library can already do nested locking if
requested. I propose we simply require that this capability be passed
through an attribute flag in the lock initialization routine, so that
code not requiring the nested capability can benefit from a faster
lock/unlock cycle.

-aaron

Reply via email to