RE: [PATCH] librdmacm/rsockets: Optimize synchronization to improve performance

Hefty, Sean Thu, 10 May 2012 09:58:20 -0700

> > A test that acquired and released a lock 2 billion times reported that
> > the custom lock was roughly 20% faster than using the mutex.
> > 26.6 seconds versus 33.0 seconds.
> 
> I think you are measuring the fact your call is inlined and pthreads
> has an indirect jump - because internally pthreads implements the same
> thing using a futex instead of a sem_t.


This is what I suspect as well.
 
> > in releasing a lock.  However, we keep the custom lock based on
> > the results of the direct lock tests that were done.
> 
> This does hurt portability though, the GCC extension
> __sync_fetch_and_add is not supported on all targets..

I'll fixup that by falling back to a mutex when __sync_fetch_and_add is not 
available.

> > As to the hotspot, the unlock in question occurs during rsend().  The
> > hotspot may simply be the result of processing the send completion.
> 
> Are you using a stochastic profiler? It may show as a hot spot simply
> because the unlock is a context switch point.

I'm using Intel's VTune Amplifier XE 2011, using the hotspot analysis settings. 
 I had already gone through the trouble of creating the custom lock before 
realizing that the hotspot was likely the result of some other interaction.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] librdmacm/rsockets: Optimize synchronization to improve performance

Reply via email to