On Monday, 21 April 2014 at 00:11:14 UTC, Jay Norwood wrote:
So this printDiamonde2b example had the fastest time of the solutions, and had similar times on all three builds. The ldc2 compiler build is performing best in most examples on ubuntu.

void printDiamonde2b(in uint N)
{
    uint N2 = N/2;
    char pSpace[] = uninitializedArray!(char[])(N2);
    pSpace[] = ' ';

    char pStars[] = uninitializedArray!(char[])(N+1);
    pStars[] = '*';

    pStars[$-1] = '\n';

    auto w = appender!(char[])();
    w.reserve(N*3);

    foreach (n ; 0 .. N2 + 1){
        w.put(pSpace[0 .. N2 - n]);
        w.put(pStars[$-2*n-2 .. $]);
        }

    foreach_reverse (n ; 0 .. N2){
        w.put(pSpace[0 .. N2 - n]);
        w.put(pStars[$-2*n-2 .. $]);
        }

    write(w.data);
}

With this slightly tweaked solution, I can get times of roughly 50% to 100% faster, on my dmd-linux box:

//----
void printDiamonde2monarch(in uint N)
{
    uint N2 = N/2;

    char[] pBuf = uninitializedArray!(char[])(N + N2);
    pBuf[ 0 .. N2] = ' ';
    pBuf[N2 ..  $] = '*';

    auto slice = uninitializedArray!(char[])(3*N2*N2 + 4*N);

    size_t i;
    foreach (n ; 0 .. N2 + 1){
        auto w = 1 + N2 + n;
        slice[i .. i + w] = pBuf[n .. w + n];
        slice[(i+=w)++]='\n';
    }

    foreach_reverse (n ; 0 .. N2){
        auto w = 1 + N2 + n;
        slice[i .. i + w] = pBuf[n .. w + n];
        slice[(i+=w)++]='\n';
    }

    write(slice[0 .. i]);
}
//----

The two "key" points here, first, is to avoid using appender. Second, instead of having two buffer: " " and "******\n", and two do two "slice copies", to only have 1 buffer " *****", and to do 1 slice copy, and a single '\n' write. At this point, I'm not sure how we could be going any faster, short of using alloca...

How does this hold up on your environment?

Reply via email to