On Wed, 8 Oct 1997, Ralf Hinze <[EMAIL PROTECTED]>: Sorting is a hobby-horse of mine, so I cannot resist the temptation to elaborate on the subject.... [sort package omitted] On Wed, 08 Oct 1997, Chris Okasaki <[EMAIL PROTECTED]>: I ran some similar experiments in Standard ML a few years ago. In those experiments pairingSort also performed extremely well. The only algorithm that performed better, and even then only by a small amount, was splaySort... [`splaySort' omitted] On Thu, 9 Oct 1997, Lennart Augustsson <[EMAIL PROTECTED]>: Well, I'm a sucker for a benchmark so I ran all of these with hbc. I also added the smooth merge sort that comes with hbc... [`sortHbc' omitted.] Here are my results (best viewed with a wide window): < <= > >= = >2 >100 <2 <100 ><2 rand pairingSort | 0.71 | 1.42 | 0.99 | 1.97 | 0.77 | 1.20 | 2.02 | 1.23 | 2.22 | 1.19 | 4.32 | jonsSort | 0.64 | 1.32 | 0.65 | 1.33 | 1.42 | 1.92 | 2.30 | 1.93 | 2.27 | 1.83 | 4.68 | splaySort | 0.58 | 1.17 | 0.59 | 1.65 | 0.58 | 1.20 | 2.10 | 1.26 | 2.02 | 1.11 | 5.01 | sortHbc | 0.18 | 0.38 | 1.15 | 2.00 | 0.19 | 1.49 | 1.03 | 1.61 | 1.78 | 1.45 | 2.41 | nmsort | 0.25 | 0.52 | 1.08 | 2.05 | 0.25 | 1.52 | 1.10 | 1.62 | 1.67 | 1.48 | 2.31 | (These results were gathered with Ralph's PQSort.lhs module compiled with `ghc -O' on a quiet UltraSparc, with 128M of RAM, running Solaris 2.5; see the attached shell script which takes the minimum over 10 runs for each benchmark.) `nmsort' is Bob Buckley's ([EMAIL PROTECTED]) natural merge sort (see Knuth Vol. 2). I have attached it to the message. `nmsort' is essentially the same as Lenart's merge sort, but with a structure more like Jon's heap sort -- in fact `foldb' is a synthesis of an almost identical combinator used by Bob Buckley and Jon's `foldtree' function. The similarity of `nmsort' and `sortHbc' is reflected by the fact that they perform almost identically, with `sortHbc' performing slightly better on ordered inputs and `nmsort' edging it on random data. Both of the sorts lose out on inputs that contain few natural runs (as Jon explained to me in his earlier message); for the run-challenged inputs, (`>',`>=',`>2', `<2' and `><2': the descending lists and the two cycle lists) the other sorts win out, with `splaySort' performing strongly. However, the merge sorts tend to get closer when they are defeated. Comparing `splaySort' and `sortHbc' in the above data set we get the following (where a `+' indicates a win for `sortHbc' and `-' a win for `splaySort'): < <= > >= = >2 >100 <2 <100 ><2 rand +3.2 +3.1 -2.0 -1.2 +3.1 -1.3 +2.0 -1.3 +1.1 -1.3 +2.1 Apart from the -2.0 result on the sorted input, the merge sorts have a lower bound of -1.3. However, on ascending outputs they all exceed 3.0 with the random input getting a score of 2.1. Which data set is the most important is subjective but I would be looking for the best results on the sorted and random inputs. I would pay attention to the random inputs because I think of it as representing the `average' case (this justification is admittedly a bit wooly). I regard the the sorted inputs as an important special case because it allows me to enumerate ordered data structures into lists, process them and rebuild the data structure again, quickly; the first thing to be done in rebuilding such a data structure is to sort it, but that's OK because my sorting algorithm degenerates nicely into little more than a quick linear check that the list is ordered. In summary, I am with Lennart: a naturally balanced merge sort seems pretty good. Cheers, Chris Dornan [EMAIL PROTECTED] University College Cork +353 21 903165 PS: Isn't it pleasing that good functional sorts have the following atributes, all at the same time: a) they work well in a wide variety of situations b) they are worst case O(n.log(n)) on all inputs c) they are worst case O(n) on sorted inputs d) they are completely general purpose e) and, of course, they are rather compact and elegant. PPS: Isn't it nice to have a stable programming language with good compilers and interpretters to support it. -- nmsort ----------------------------------------------------------------------------------- > nmsort :: Ord a => [a] -> [a] > nmsort = nmsortBy compare > nmsortBy :: (a->a->Ordering) -> [a] -> [a] > nmsortBy cmp xs = foldb (merge0 cmp) [] (runs0 cmp xs) > runs0 :: (a->a->Ordering) -> [a] -> [[a]] > runs0 cmp xs = foldr op [] xs > where > op z xss@(xs@(x:_):xss') | le z x = (z:xs):xss' > op z xss = [z]:xss > > le z x = case cmp z x of { GT -> False; _ -> True } > merge0:: (a->a->Ordering) -> [a] -> [a] -> [a] > merge0 cmp [] l = l > merge0 cmp l@(_:_) [] = l > merge0 cmp l1@(h1:t1) l2@(h2:t2) = > case cmp h1 h2 of > GT -> h2:merge0 cmp l1 t2 > _ -> h1:merge0 cmp t1 l2 > foldb :: (a->a->a) -> a -> [a] -> a > foldb f zero [] = zero > foldb f zero [x] = x > foldb f zero xs = foldb f zero (fold xs) > where > fold (x1:x2:xs) = f x1 x2 : fold xs > fold xs = xs -- bench shell script ----------------------------------------------------------------------- #!/bin/sh if [ $# != 1 ]; then echo usage: $0 sort-algorithm exit 1 fi tmp=/tmp/out$$ sort () { time -p pqsort $1 $2 50000 +RTS -K16M -H64M >/dev/null 2>$tmp || \ { cat $tmp; rm $tmp; echo; echo "$0: terminated"; exit 1; } result=`sed -n -e 's/real *\([0-9.]*\)/\1/p' $tmp` } echo "$1: |\c" for i in 0 1 2 3 4 5 6 7 8 9 10; do min=1000.0 for x in 1 2 3 4 5 6 7 8 9 10; do sort $1 $i lt=`echo "if ($result<$min) 1; if ($result>=$min) 0"|bc` if [ $lt = 1 ]; then min=$result fi done echo " $result |\c" done echo exit 0