http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47031

--- Comment #6 from js-gcc at webkeks dot org <js-gcc at webkeks dot org> 
2011-01-07 18:43:15 UTC ---
> This means that, assuming that spinlocks are infinitely faster than mutexes, 
> in
> the best possible conditions they would speed up the accessors (or, at least,
> the setter, but the getter should be the same) by a maximum of about 2x. 
> That's a great speedup by the way, but it's important to keep in mind that the
> maximum speedup we can theoretically ever get for accessors is a 2x, not a 10x
> or 100x. ;-)

That's interesting. Which compiler flags did you use? Did you try
-fomit-framepointer? Did you do IMP caching or did you dispatch the method
again all the time? With which flags did you build the rest runtime?

Remember: The dispatch in the current implementation is not very efficient and
the significance of this could change a lot once the dispatch is optimized.

> Anyway, the key problem is that spinlocks are more wasteful (and unpredictable
> ?) than mutexes if there is contention and if the ratio of active threads vs.
> active cores is not favourable. 

Well, not really:

Usually, the lock is not held. If it is, you do a little trick: You spin 10
times and if you still could not get the lock, it's likely the current thread
is blocking another thread from releasing the spinlock. Again, quite unlikely,
as the spinlock is only held for an extremely short amount of time. However, if
it happens that after 10 spins you still could not get the lock, you call
sched_yield() to NOT waste resources.

So, in the worst case, you waste 10 spins. That's basically 10 compares. That's
nothing compared to a user/kernelspace switch, which is often 10 times more.
Especially on architectures like SPARC64, this is extremely expensive. And it's
extremely unlikely the OS gives control to another OS before the lock is
released again. So, this almost never happens. So we can assume that almost
always the spinlock is faster here.

> It is hard to see how you
> can guarantee anything about contention or ratio of active threads vs. active
> cores in that context. :-(

See the explanation above and in the comment before. I listed all the
situations that can happen there, including multicore and singlecore machines.

> I guess I'd feel a bit more confident if we were to experiment and benchmark
> the worst-case scenario of spinlocks ... ie, how bad can they really go
> compared to mutexes and how easy/hard it is for them to go bad in the case of
> accessors.

If you give control to another thread after 10 spins, I doubt it will behave
much worse than mutexes. And again, this is very unlikely to happen. But feel
free to benchmark :). I'd be especially interested in benchmarks on platforms
where context switches are cheap and benchmarks on platforms where context
switches are extremely expensive.

> Anyway, I understand you think spinlocks will always be faster than mutexes in
> this case, because "My guess is that usually the spinlock is not held", but 
> the
> problem is that that is a guess, and it depends on how the accessors are used.
> ;-)

Feel free to print "Conflict!" or something when that happens and try it in a
realworld scenario ;).

For the large list you created, I'm not sure what I should reply, because I'm
not sure whether those are questions or if you regard them as facts and because
I think I already commented most of it before.

One thing though: "(having 100s of threads is pretty common in server software
where you 
are processing large numbers of indipendent requests simultaneously;"
This is considered bad design. Stuff like select(), epoll() etc. exists ;). You
usually should have num_cores threads for best performance.

> But if the second thread is running on the same
> core, it will never make any progress during the spinning, and the first 
> thread
> will waste CPU time before being suspended - in this case it will be slower 
> and
> more CPU-wasteful than a mutex.

See above about the maximum spin count and sched_yield().

> * I also wonder, with all the CPUs that GCC support, if the GCC atomic
> builtins are efficient on all platforms; if on a platform the builtins require
> function calls they may actually make the spinlocks less attractive compared 
> to
> mutexes (we can obviously work around this by using spinlocks on the two or
> three major platforms where we know they work well, and using mutexes on the
> other ones).

Do you know of any architectures where GCC needs to call a method? I've seen
GCC trying to call some methods for atomic ops - but this was always on
architectures which didn't support atomic operations at all. What I do is that
I check for atomic ops, then spinlocks in pthread and if that all fails fall
back to mutexes.

> I would have no problem changing my mind and supporting the use of spinlocks 
> if 
> there is some evidence that they can't go *too* wrong ;-)

I guess you you should just test "hybrid spin locks" where you give control to
the kernel after a certain spin amount ;).

> At the moment, I feel the performance trade-offs are unclear and we'd be using
> spinlocks only because Apple is using them in their runtime.

This would also mean bigger compatibility :P. Everybody assumes it's spinlocks!
:P

Reply via email to