Re: std.parallelism: Request for Review

dsimcha Sun, 27 Feb 2011 08:41:08 -0800

On 2/27/2011 9:48 AM, dsimcha wrote:

On 2/27/2011 8:03 AM, Russel Winder wrote:

32-bit mode on a 8-core (twin Xeon) Linux box. That core.cpuid bug
really, really sucks.


I see matrix inversion takes longer with 4 cores than with 1!

Actually, I am able to reproduce this, but only on Linux, and I think Ifigured out why. I think it's related to my Posix workaround for Bug3753 (http://d.puremagic.com/issues/show_bug.cgi?id=3753). Thisworkaround causes GC heap allocations to occur in a loop inside thematrix inversion routine (one for each call to parallel(), so 256 overthe course of the benchmark). This was intended to be a very quick anddirty workaround for a DMD bug that I thought would get fixed a longtime ago. It also seemed good enough at the time because I was usingthis lib for very coarse grained parallelism, where the effect isnegligible.

Originally, I was using alloca() all over the place to efficiently dealwith memory management. However, under Posix, I ran into Bug 3753 along time ago and put in the following workaround, which simply forwardsalloca() calls to the GC. From near the top of parallelism.d:


// Workaround for bug 3753.
version(Posix) {
    // Can't use alloca() because it can't be used with exception
    // handling.
    // Use the GC instead even though it's slightly less efficient.
    void* alloca(size_t nBytes) {
        return GC.malloc(nBytes);
    }
} else {
    // Can really use alloca().
    import core.stdc.stdlib : alloca;
}

In this particular use case the performance hit is probably substantial.There are ways to mitigate it (maybe having TaskPool maintain a freelist, etc.), but I can't bring myself to put a lot of effort intooptimizing a workaround for a compiler bug.

Re: std.parallelism: Request for Review

Reply via email to