[lock-free] Re: Transactional memory mutex

'Dmitry Vyukov' via Scalable Synchronization Algorithms Mon, 15 Feb 2016 13:11:02 -0800

On Wednesday, February 10, 2016 at 10:12:30 AM UTC+1, Graham Miller wrote:
>
> I am interested in exploring the transactional memory features of the new 
> Intel processors (using the GCC transactional features described in [1]).  
> To do so I created a simple implementation of a mutex using this post about 
> futexes [2] as a guide. 
>
> I'm looking for feedback on the implementation, if anyone has a moment.  
> It is just the mutex implementation, so comprises only about 100 lines of 
> code:
>
> https://github.com/laslowh/tmutex/blob/master/mutex.c
>
> I would appreciate any comments you have, though I am particularly focused 
> on correctness at this point, not performance.
>



Hi Graham,


The transactions are meant to simplify concurrent programming, but I am 
actually not sure that they succeed in this case. This seems to be more 
difficult to read and understand and plain old atomics-based code. It would 
be simpler if you surround whole functions with a single transaction block, 
but then I guess you would have issues with blocking on futex...

You use __transaction_atomic, atomic transactions can't interfere with 
memory accesses in other threads. So can't the following loop actually 
abort concurrent attempts to lock the mutex, defeating its whole purpose?

  /* Spin and hope someone takes the lock */
  for (i = 0; i < 200; i++) {
    if (m->b.locked) return 0;
    cpu_relax();
  }

Also I guess the read inside of FUTEX_WAIT_PRIVATE can abort a concurrent 
attempt to unlock the mutex.

Also I've noticed that it's trickier than it appears on first sight because 
futexes are subject to spurious wakeups, so "m->b.contended = 0;" can 
actually reset contended as set by the _next_ thread (the one that was 
unblocked by a spurious wakeup). I am not sure whether it can lead to a 
deadlock or not.

The /* Spin and hope someone takes the lock */ optimization is usually 
useful only for synthetic benchmarks and harmful for real workloads (that 
loop wastes CPU time). What's usually useful in real program is futile 
wakeup throttling optimization.

But in general I am not sure that using transnational memory for mutex 
implementation is a good idea at all. TM is meant for situations when 
transactions can proceed in parallel. Here is no chance that two 
transactions don't conflict. An implementation based on CAS-loop that 
atomically moves the mutex state through well defined states should be 
cleaner and faster.

What could make sense is transaction lock _elision_ (when you turn the 
whole critical section into a transaction). Such critical sections can 
proceed in parallel for e.g. hash tables. See e.g. 
https://lwn.net/Articles/534758/

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/34f85f22-bbef-4ac3-9244-1d247ee62aef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[lock-free] Re: Transactional memory mutex

Reply via email to