Re: [Haskell-cafe] Parallel weirdness

Murray Gross Sat, 19 Apr 2008 08:40:19 -0700

I can't offer definite answers to your questions, but I can suggest a fewissues you should consider:

1. Merge sort doesn't parallelize all that well--when the blocks aresmall, the parallelization overhead is large in comparison with theproductive work that is to be done, and when the blocks get large, theamount of parallelization possible is not great. Quicksort andquickersort, of course, suffer from the same issue. The end result is thatyour timings will be heavily dependent on your hardware, software, and theproperties of the particular data set you use for testing.

2. You need to account for I/O buffering (not only by your OP system inRAM, but by your disk controller)--after the first set of I/O operations,your data may be in buffers, so subsequent uses may retrieve data frombuffers rather than from the disk itself. Similarly, you also have to takeinto account paging and cache issues, which could make the first run muchslower than immediate subsequent runs.

3. A better benchmark would be provided by a counting sort, which doesparallelize well (O(n * (n/k), where k is the number of processors, and nis the number of elements to be sorted). A major advantage of using acounting sort for benchmarking is that it runs slowly enough to make itrelatively easy to compare sequential and parallel timings.

4. Depending on your system defaults, there may also be memory allocationissues that need to be taken into account (which could also easily causethe first run to be considerably slower than subsequent runs madeimmediately after the first).




Murray Gross
Brooklyn College



On Sat, 19 Apr 2008, Andrew Coppin wrote:

OK, so just for fun, I decided to try implementing a parallel merge sortusing the seq and par combinators. My plan was to generate some psuedo-randomdata and time how long it takes to sort it. To try to account for lazyevaluation, what the program actually does is this:
1. Write the input data to disk without any sorting. (This ought to force itto be fully evaluated.)
2. Sort and save the data to disk 8 times. (So I can average the runtimes.)
This is done with two data sets - one with 1 million items, and another with2 million rows. Each data set is run through both the purely sequentialalgorithm and a simple parallel one. (Split the list in half, merge-sort eachhalf in parallel, and then merge them.)
The results of this little benchmark utterly defy comprehension. Allow me toenumerate:
Weird thing #1: The first time you sort the data, it takes a few seconds. Theother 7 times, it takes a split second - roughly 100x faster. Wuh?
Weird thing #2: The parallel version runs *faster* than the sequential one inall cases - even with SMP disabled! (We're only talking a few percent faster,but still.)
Weird thing #3: Adding the "-threaded" compiler option makes *everything* runa few percent faster. Even with only 1 OS thread.
Weird thing #4: Adding "-N2" makes *everything* slow down a few percent. Inparticular, Task Manager shows only one CPU core in use.
Adding more than 2 OS threads makes everything slow down even further - butthat's hardly surprising.
Can anybody explain any of this behaviour? I have no idea what I'mbenchmarking, but it certainly doesn't appear to be the performance of aparallel merge sort!
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Parallel weirdness

Reply via email to