> 
> My earlier question stands: why does boehm-gc use spinlocks and not 
> pthread mutexes, or condition variables? Was it a deliberate decision, or
> ignorance on the part of the Linux porters?  That choice suprised me a
> little, since garbage collection can run for long durations and cause
> extensive spinning on waiting processors.

The version of boehm I downloaded a while ago says this:

    /* Reasonably fast spin locks.  Basically the same implementation */
    /* as STL alloc.h.  This isn't really the right way to do this.   */
    /* but until the POSIX scheduling mess gets straightened out ...  */

Secondly, tries to find out whether it's on an uni or smp
by spinning a number of times and then adjusting accordingly.

If it does suspect that it's being scheduled against the other
process on the same processor, it backs off and uses nanosleep
to give up the CPU.
They're careful to make sure that nanosleep doesn't spin:

                /* nanosleep(<= 2ms) just spins under Linux.  We        */
                /* want to be careful to avoid that behavior.           */

Furthermore, note that this lock protects both allocation and collection.
Allocation is non-blocking and quick.
If a collection happens ("GC_collecting" is set) the code jumps right
to the "yield:" label where it does not spin.  So the long durations of
gc is not an issue.  (At least in the version I'm looking at - 4.13alpha3)

> For that matter, Linux pthreads is optimized for uniprocessors.  It could
> be improved by reading /proc/cpuinfo on startup to determine if it is
> running on a SMP kernel.  I hacked libpthread.so once for MP use, and
> observed up to a 30% speedup on some of my real-world Java code.


It would be nice to see these hacks.
What exactly did you change to optimize for MP use?

> 
> What exactly do you wish to improve in pthread_mutex_lock?  There are so
> many variables, such as average latency with or without lock contention,
> lock concurrency, resource utilization, predictability (ratio between
> average and worst case), etc.  Clearly the technique you favor depends on
> whether you are designing for maximum throughput, real-time, or other
> needs.
> 

>From what I can see, pthreads already does a good job for latency 
in the uncontended case.  I'd like to see the latency contended case improved.
I don't care about lock concurrency or any measure of fairness (for now).
I'm not sure what you mean by resource utilization (do you mean CPU resource
utilization, i.e., not wasting cycles due to spinning?)  I think I'd care
about that too.

Basically, all I want is to not have applications with multiple threads and
highly contended locks (java runtime systems) run slower on two processors 
than on one processor.  If they don't run twice as fast, I can live with that.

        - Godmar


----------------------------------------------------------------------
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to