On Wednesday, February 10, 2016 at 10:12:30 AM UTC+1, Graham Miller wrote:
>
> I am interested in exploring the transactional memory features of the new
> Intel processors (using the GCC transactional features described in [1]).
> To do so I created a simple implementation of a mutex using this post about
> futexes [2] as a guide.
>
> I'm looking for feedback on the implementation, if anyone has a moment.
> It is just the mutex implementation, so comprises only about 100 lines of
> code:
>
> https://github.com/laslowh/tmutex/blob/master/mutex.c
>
> I would appreciate any comments you have, though I am particularly focused
> on correctness at this point, not performance.
>
Hi Graham,
The transactions are meant to simplify concurrent programming, but I am
actually not sure that they succeed in this case. This seems to be more
difficult to read and understand and plain old atomics-based code. It would
be simpler if you surround whole functions with a single transaction block,
but then I guess you would have issues with blocking on futex...
You use __transaction_atomic, atomic transactions can't interfere with
memory accesses in other threads. So can't the following loop actually
abort concurrent attempts to lock the mutex, defeating its whole purpose?
/* Spin and hope someone takes the lock */
for (i = 0; i < 200; i++) {
if (m->b.locked) return 0;
cpu_relax();
}
Also I guess the read inside of FUTEX_WAIT_PRIVATE can abort a concurrent
attempt to unlock the mutex.
Also I've noticed that it's trickier than it appears on first sight because
futexes are subject to spurious wakeups, so "m->b.contended = 0;" can
actually reset contended as set by the _next_ thread (the one that was
unblocked by a spurious wakeup). I am not sure whether it can lead to a
deadlock or not.
The /* Spin and hope someone takes the lock */ optimization is usually
useful only for synthetic benchmarks and harmful for real workloads (that
loop wastes CPU time). What's usually useful in real program is futile
wakeup throttling optimization.
But in general I am not sure that using transnational memory for mutex
implementation is a good idea at all. TM is meant for situations when
transactions can proceed in parallel. Here is no chance that two
transactions don't conflict. An implementation based on CAS-loop that
atomically moves the mutex state through well defined states should be
cleaner and faster.
What could make sense is transaction lock _elision_ (when you turn the
whole critical section into a transaction). Such critical sections can
proceed in parallel for e.g. hash tables. See e.g.
https://lwn.net/Articles/534758/
--
---
You received this message because you are subscribed to the Google Groups
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/lock-free/34f85f22-bbef-4ac3-9244-1d247ee62aef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.