Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes

Thomas Gleixner Thu, 22 Sep 2016 13:42:56 -0700

On Thu, 22 Sep 2016, Waiman Long wrote:
> BTW, my initial attempt for the new futex was to use the same workflow as the
> PI futexes, but use mutex which has optimistic spinning instead of rt_mutex.
> That version can double the throughput compared with PI futexes but still far
> short of what can be achieved with wait-wake futex. Looking at the performance
> figures from the patch:
> 
>                 wait-wake futex     PI futex        TO futex
>                 ---------------     --------        --------
> max time            3.49s            50.91s          2.65s
> min time            3.24s            50.84s          0.07s
> average time        3.41s            50.90s          1.84s
> sys time          7m22.4s            55.73s        2m32.9s


That's really interesting. Do you have any explanation for this massive
system time differences?

> lock count       3,090,294          9,999,813       698,318
> unlock count     3,268,896          9,999,814           134
> 
> The problem with a PI futexes like version is that almost all the lock/unlock
> operations were done in the kernel which added overhead and latency. Now
> looking at the numbers for the TO futexes, less than 1/10 of the lock
> operations were done in the kernel, the number of unlock was insignificant.
> Locking was done mostly by lock stealing. This is where most of the
> performance benefit comes from, not optimistic spinning.

How does the lock latency distribution of all this look like and how fair
is the whole thing?

> This is also the reason that a lock handoff mechanism is implemented to
> prevent lock starvation which is likely to happen without one.

Where is that lock handoff mechanism?

Thanks,

        tglx

Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes

Reply via email to