Josh wrote:

How can I make my parallel code more efficient. Currently, I am getting destroyed by the serial merge sort.

http://pastebin.com/M0GKfTTX

That serial merge sort I've written is little more than a toy. I suggest you to compare your parallel sort with a serial sort that allocates better. Perhaps later I'll add it.


Here I have done a quick translation of some C code from Wikipedia, this is wasteful for memory (not tested much!):


import std.stdio, std.algorithm;

/// A has the items to sort, array B is a work array.
void mergeSort(T)(T[] A) pure nothrow @safe
out {
    assert(A.isSorted);
} body {
static void bottomUpMerge(T)(in T[] A, in size_t iLeft, in size_t iRight, in size_t iEnd, T[] B)
    pure nothrow @safe @nogc {
        size_t i0 = iLeft;
        size_t i1 = iRight;

        // While there are elements in the left or right runs.
        for (size_t j = iLeft; j < iEnd; j++) {
// If left run head exists and is <= existing right run head.
            if (i0 < iRight && (i1 >= iEnd || A[i0] <= A[i1])) {
                B[j] = A[i0];
                i0++;
            } else {
                B[j] = A[i1];
                i1++;
            }
        }
    }

    immutable n = A.length;
    auto B = new T[n];

    // Each 1-element run in A is already "sorted".
// Make successively longer sorted runs of length 2, 4, 8, 16... until whole array is sorted.
    for (size_t width = 1; width < n; width = 2 * width) {
        // Array A is full of runs of length width.
        for (size_t i = 0; i < n; i = i + 2 * width) {
// Merge two runs: A[i:i+width-1] and A[i+width:i+2*width-1] to B[]
            // or copy A[i:n-1] to B[] ( if(i+width >= n) ).
bottomUpMerge(A, i, min(i + width, n), min(i + 2 * width, n), B);
        }

        // Now work array B is full of runs of length 2*width.
        swap(A, B);
    }
}

void main() {
    auto a = [3,1,2,5,4,8,6,7,2,9,1,4,3];
    a.mergeSort;
    a.writeln;
}

Bye,
bearophile

Reply via email to