On Sunday, 21 April 2013 at 12:08:54 UTC, bearophile wrote:
Arrays#parallelSort uses Fork/Join framework introduced in Java
7 to assign the sorting tasks to multiple threads available in
the thread pool. This is called eating your own dog food.
Fork/Join implements a work stealing algorithm where in a idle
thread can steal tasks queued up in another thread.
An overview of Arrays#parallelSort:
The method uses a threshold value and any array of size lesser
than the threshold value is sorted using the Arrays#sort() API
(i.e sequential sorting). And the threshold is calculated
considering the parallelism of the machine, size of the array
and is calculated as:
private static final int getSplitThreshold(int n) {
int p = ForkJoinPool.getCommonPoolParallelism();
int t = (p > 1) ? (1 + n / (p << 3)) : n;
return t < MIN_ARRAY_SORT_GRAN ? MIN_ARRAY_SORT_GRAN : t;
}
Once its decided whether to sort the array in parallel or in
serial, its now to decide how to divide the array in to
multiple parts and then assign each part to a Fork/Join task
which will take care of sorting it and then another Fork/Join
task which will take care of merging the sorted arrays. The
implementation in JDK 8 uses this approach:
- Divide the array into 4 parts.
- Sort the first two parts and then merge them.
- Sort the next two parts and then merge them.
And the above steps are repeated recursively with each part
until the size of the part to sort is not lesser than the
threshold value calculated above.
I think it's worth adding something similar as strategy of
std.algorithm.sort.
FWIW, I created a parallel sort in D a while back using
std.parallelism. It was part of std.parallel_algorithm, a
half-finished project that I abandoned because I was disappointed
at how poorly most of it was scaling in practice, probably due to
memory bandwidth. If you have some expensive-to-compare types,
though, it may be worthwhile.
https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d