Sorting is a hobby-horse of mine, so I cannot resist the temptation to elaborate on the subject. I was motivated to write this rather long reply by Carsten Kehler Holst saying `As far as I can see the difference between merge sort and heap sort as described by Jon is almost non existing'. Carsten is not quite right but he is not totally wrong either. Both sorting algorithms are based on priority queues, so it may be worthwhile to take a `data-structural look at sorting'. That's the theme of this email. There are still some open points, so any remarks, corrections, ideas etc are *welcome*. Ralf PS: Those who are interested in performance only should skip to Section 10. > import List ( group ) > import System ( getArgs ) 1. Introductory remarks ~~~~~~~~~~~~~~~~~~~~~~~ What makes up a good sorting algorithm? Here are some criteria: A. it should be asymptotically optimal (ie O(n log n) worst case behaviour ruling out quick sort ;-)), B. it should be stable (ie it may not change the order of equal elements), C. it should be smooth (a smooth sort has a linear execution time if the input is nearly sorted). All algorithms we are going to present are asymptotically optimal (with the notable exception of `jonsSort') and all of them are stable. Only `smoothMergeSort' has shown to be smooth (to the best of my knowledge). However, practical experiments suggest that `pairingSort' and `jonsSort' adopt quite well to the input data. Additional criteria one *may* consider: D. it should be lazy (ie `head . sort' has linear execution time), E. it should run faster if the input contains many equal elements. All algorithms are lazy. No algorithm explicitly addresses Criterion E. Again, experiments suggest that `pairingSort' and `jonsSort' adopt quite well to the input data. It is advisable to gather some test data to check the various implementations. A stable sorting algorithm should perform well on the following data. > strictlyIncreasing n = [1 .. n] > increasing n = interleave x x > where x = strictlyIncreasing n > strictlyDecreasing n = [n, n-1 .. 1] > decreasing n = interleave x x > where x = strictlyDecreasing n > constant n = replicate n 0 The following generators produce lists containing many equal elements (provided `k << n'). > repIncreasing k n = take n (copy [0 .. k]) > repDecreasing k n = take n (copy [k, k-1 .. 0]) > oscillating k n = take n (copy ([0 .. k] ++ [k, k-1 .. 0])) Finally we have random data. > random n = take n (random2Ints (2*n) (3*n)) A complete list of all generators. > generators :: [Int -> [Int]] > generators = [ strictlyIncreasing, -- 0 > increasing, > strictlyDecreasing, > decreasing, > constant, > repIncreasing 2, -- 5 > repIncreasing 100, > repDecreasing 2, > repDecreasing 100, > oscillating 2, > random ] -- 10 NB We only consider lists of Int's. It may be worthwhile to repeat the benchmarks (Section 10) with data designed to make comparisons dominate, see Jon's second email. 2. Priority queues ~~~~~~~~~~~~~~~~~~ Here is the abstract data type of priority queues formulated as a Haskell class definition. > data SeqView t a = Null > | Cons a (t a) > > class PriorityQueue q where > empty :: (Ord a) => q a > single :: (Ord a) => a -> q a > insert :: (Ord a) => a -> q a -> q a > meld :: (Ord a) => q a -> q a -> q a > splitMin :: (Ord a) => q a -> SeqView q a > fromList :: (Ord a) => [a] -> q a > toOrderedList :: (Ord a) => q a -> [a] > > single a = insert a empty > insert a q = single a `meld` q > fromList = foldm meld empty . map single > toOrderedList q = case splitMin q of > Null -> [] > Cons a q -> a : toOrderedList q The function `splitMin' replaces `isEmpty', `findMin' and `deleteMin' which usually belong to the standard repertoire. The call `splitMin q' returns `Null' if `q' is empty and `Cons a q1' otherwise (`a' is a minimal element of `q' and `q1' is the remaining queue). The prototypical sorting algorithm based on priority queues looks as follows (`PQ' refers to the concrete implementation) pqSort :: (Ord a) => [a] -> [a] pqSort = toOrderedList . (fromList :: (Ord a) => [a] -> PQ a) We have two phases: a creation phase and a selection phase. Typically, the creation phase has O(n) running time and the selection phase O(n log n) (`toOrderedList' calls `splitMin' n times, `splitMin' typically takes O(log n) time). Given these bounds Criterion D is automatically satisfied. There are several ways to define `fromList'. The default definition above uses the balanced variant of `fold' as defined in Appendix A. Alternatively one could define fromList = foldr insert empty Which one is preferable depends on the concrete implementation of priority queues. We will see examples of both. 3. Top-down merge sort ~~~~~~~~~~~~~~~~~~~~~~ The simplest implementation of priority queues employs ordered list. > instance PriorityQueue [] where > empty = [] > single a = [a] > meld [] y = y > meld x@(a:_) [] = x > meld x@(a:x') y@(b:y') > | a <= b = a : meld x' y > | otherwise = b : meld x y' > splitMin [] = Null > splitMin (a:x) = Cons a x > toOrderedList = id The function `meld' corresponds to `merge', hence the name of the sorting algorithm. > mergeSort :: (Ord a) => [a] -> [a] > mergeSort = toOrderedList > . (fromList ::(Ord a) => [a] -> [a]) The default definition of `fromList' guarantees O(n log n) worst case behaviour. Using `fromList = foldr insert empty' yields insertion sort which exhibits O(n^2) worst case behaviour. NB This definition of `mergeSort' corresponds to the standard textbook definition [1, p.153] [2, p. 112]; see also Miranda's standard environment. 4. Bottom-up merge sort ~~~~~~~~~~~~~~~~~~~~~~~ Using Jon's `treefold' instead of `foldm' gives a bottom-up variant. > bottomUpMergeSort :: (Ord a) => [a] -> [a] > bottomUpMergeSort = toOrderedList . fromList > where fromList :: (Ord a) => [a] -> [a] > fromList = treefold meld empty . map single Interestingly, `bottomUpMergeSort' has much in common with Richard O'Keefe's applicative merge sort [2, p. 112]; see also Chris Okasaki's sortable collections [3, p. 32]. 5. Binary heaps ~~~~~~~~~~~~~~~ To add a personal touch to the discussion here is my first implementation of heap sort which is based on binary heap ordered trees [4, p. 155]. > data BinTree a = Empty > | Bin (BinTree a) a (BinTree a) > instance PriorityQueue BinTree where > empty = Empty > single a = Bin Empty a Empty > insert b Empty = Bin Empty b Empty > insert b (Bin l a r) > | a <= b = Bin (insert b r) a l > | otherwise = Bin (insert a r) b l > meld Empty u = u > meld t@(Bin _ _ _) Empty = t > meld t@(Bin l a r) u@(Bin l' b r') > | a <= b = Bin (meld l r) a u > | otherwise = Bin t b (meld l' r') > splitMin Empty = Null > splitMin (Bin l a r) = Cons a (meld l r) > fromList = foldr insert empty Repeated `insert's with no intervening `splitMin's construct so-called Braun trees (`size r <= size l <= size r+1' holds for each `Bin l a r'). This guarantees a worst case running time of O(n log n) for `heapSort'. > heapSort :: (Ord a) => [a] -> [a] > heapSort = toOrderedList > . (fromList :: (Ord a) => [a] -> BinTree a) NB The operation `meld' on `BinTree a' is closely related to `meld' on lists, ie `merge'. We have toOrderedList (meld t u) = merge (toOrderedList t) (toOrderedList u) . Thus heapSort :: (Ord a) => [a] -> [a] heapSort = toOrderedList . fromlist where fromList :: (Ord a) => [a] -> BinTree a fromList = foldm meld empty . map single is structurally equivalent to `mergeSort' (ie it performs the same comparisons during the sorting process). 6. Pairing heaps ~~~~~~~~~~~~~~~~ Chris has pointed out that Jon's sorting algorithm uses *multipass* pairing heaps [5]. As we shall see pairing heaps perform extremely well in practice. However, no tight theoretical bounds are known. We start with an implementation of `ordinary' pairing heaps. > data Tree a = Nil > | Node a [Tree a] NB The constructor `Nil' is only used on the top-level, it never appears below a `Node' . > instance PriorityQueue Tree where > empty = Nil > single a = Node a [] > meld Nil u = u > meld t@(Node _ _) Nil = t > meld t@(Node a ts) u@(Node b us) > | a <= b = Node a (u:ts) > | otherwise = Node b (t:us) > splitMin Nil = Null > splitMin (Node a ts) = Cons a (meldAll ts) > fromList = meldAll . map single Different variants of pairing heaps differ in the implementation of `meldAll'. > meldAll :: (Ord a) => [Tree a] -> Tree a > meldAll [] = Nil > meldAll [t] = t > meldAll (t:u:ts) = meld (meld t u) (meldAll ts) Note that subsequent trees are first paired using `meld', hence the name of the data structure. > pairingSort :: (Ord a) => [a] -> [a] > pairingSort = toOrderedList > . (fromList :: (Ord a) => [a] -> Tree a) What about the running time? Fredman et al [5] show that `meld' and `splitMin' run in O(log n) amortized time. Hence we have O(n log n) worst case behaviour. NB Chris Okasaki has developed a persistent variant of pairing heaps [6] which might be worth trying, as well. The function `meldAll' corresponds to `foldr meld empty . pairfold meld'. Alternatively, one can make repeated passes over the trees using `treefold meld empty'. For ease of reference we call this variant `jonsSort' ;-). > jonsSort :: (Ord a) => [a] -> [a] > jonsSort = toOrderedList . fromList > where > meldAll = treefold meld empty > fromList = meldAll . map single > toOrderedList Nil = [] > toOrderedList (Node a ts) = a : toOrderedList (meldAll ts) Fredman et al [5] state that the multipass variant is not easy to analyse which is certainly true. They only succeed in proving an O(log n log log n/log log log n) bound on the amortized time per heap operation. Hence it is not clear whether `jonsSort' is an O(n log n) algorithm. NB Jon's implementation is not stable since he uses the test `a < b' in the definition of `meld'. I know of no argument or (formal) proof that sorting with pairing heaps is smooth. Perhaps you know? 7. Imperative heap sort ~~~~~~~~~~~~~~~~~~~~~~~ Imperative implementations of heap sort use an implicit representation of binary heaps ie the binary tree is embedded in an array. For a functional explanation we step backward and represent an array by a binary tree. The original heap sort uses left-complete trees. We use Braun trees [2, p. 154] instead. However, every implementation of functional arrays would do. We could generalize the following to an array-based implementation of priority queues. > newtype Braun a = Braun (BinTree a) > instance PriorityQueue Braun where > empty = Braun empty > single a = Braun (single a) > insert a (Braun t) = Braun (insert a t) > splitMin (Braun Empty) = Null > splitMin (Braun (Bin l a r)) > = case splitLeft l of > Null -> Cons a empty > Cons b l' -> Cons a (Braun (siftdown r b l')) > fromList = foldr insert empty The construction of the `heap' works top-down. EXERCISE. Devise a bottom-up variant which uses `siftdown' (see below). Bird [8, p. 21] gives a solution to this problem. END. The second phase follows the imperative version quite closely: the smallest element (ie the leftmost) is replaced by an arbitrary element (usually the rightmost) which is sifted down the heap. Both `splitLeft' and `siftdown' employ the fact that their arguments are Braun trees. > splitLeft :: BinTree a -> SeqView BinTree a > splitLeft Empty = Null > splitLeft (Bin l a r) = case splitLeft l of > Null -> Cons a Empty -- `r == Empty' > Cons b l' -> Cons b (Bin r a l') Functionally speaking, `siftdown' is a smart constructor for binary heaps. > siftdown :: (Ord a) => BinTree a -> a -> BinTree a > -> BinTree a > siftdown Empty a r = single a -- `r == Empty' > siftdown l@(Bin _ b _) a Empty > | a <= b = Bin l a Empty > | otherwise = Bin (single a) b Empty > siftdown l@(Bin l1 a1 r1) a r@(Bin l2 a2 r2) > | a <= a1 && a <= a2 = Bin l a r > | a1 <= a2 = Bin (siftdown l1 a r1) a1 r > | otherwise = Bin l a2 (siftdown l2 a r2) > braunSort :: (Ord a) => [a] -> [a] > braunSort = toOrderedList > . (fromList :: (Ord a) => [a] -> Braun a) 8. Natural or smooth merge sort ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Merge sort has O(n log n) running time independent of the input. However, it is easy to make merge sort smooth by dividing the input into increasing runs. Carsten suggests to use `group (<=)' which only takes increasing runs into account. The following variant uses both increasing and decreasing runs. > runs :: (Ord a) => [a] -> [[a]] > runs [] = [] > runs x@[a] = [x] > runs (a:b:x) > | a <= b = ascRun b (\y -> a:y) x > | otherwise = descRun b [a] x > where > ascRun a as [] = [as [a]] > ascRun a as x@(b:y) > | a <= b = ascRun b (\y -> as (a:y)) y > | otherwise = as [a] : runs x > descRun a as [] = [a:as] > descRun a as x@(b:y) > | a <= b = (a:as) : runs x > | otherwise = descRun b (a:as) y NB To preserve stability `descRun' uses only *strictly* decreasing sequences: [n, n, n-1, n-1, .. 1, 1] is split into n runs. Does anybody know of a better solution? > smoothMergeSort :: (Ord a) => [a] -> [a] > smoothMergeSort = toOrderedList . fromList > where fromList = foldm meld empty . runs The bottom-up variant of `smoothMergeSort' closely corresponds to Richard O'Keefe's `smooth applicative merge sort' [2, p. 113]. EXERCISE. Make `heapSort' smooth. Hint: Replace `Empty' by `Run [a]' in the definition of `BinTree a'. END. 9. Testing ~~~~~~~~~~ We refrain from proving the correctness of the various sorts and perform some simple test. The proofs are left as an ... > sorter :: (Ord a) => [[a] -> [a]] > sorter = [ mergeSort, -- 0 > bottomUpMergeSort, > heapSort, > pairingSort, > jonsSort, > braunSort, -- 5 > smoothMergeSort ] Just type `test 10' to convince yourself that the sorts work for input data of length 10. > test :: Int -> Bool > test n = and [ mergeSort x == sort x > | sort <- sorter, > input <- generators, > let x = input n ] 10. Benchmarks ~~~~~~~~~~~~~~ The code works for both Hugs and ghc-2.07 (it should work for hbc and nhc as well but I haven't verified this.) > main = do args <- getArgs > let sort = sorter!!read (args!!0) > let input = generators!!read (args!!1) > let n = read (args!!2) > print (last (sort (input n))) To compile use ghc -O -o pqsort PQSort.lhs -H32M To run use something like pqsort 7 10 50000 +RTS -K32M -H32M And the winner is ... [you should probably enlarge your window] 50000 | < | <= | > | >= | == | 1 2* | 1..100* | 2 1* | 100..1* | 1 2 2 1* | random merge | 1.36 | 3.16 | 1.33 | 2.89 | 1.37 | 1.92 | 3.04 | 1.92 | 2.97 | 1.92 | 7.70 bottom-up | 0.91 | 2.12 | 0.88 | 1.98 | 0.98 | 1.71 | 2.93 | 1.77 | 2.89 | 1.77 | 7.80 heap | 9.16 | oo | 9.24 | oo | 4.39 | 7.52 | 8.58 | 7.47 | 8.55 | 7.58 | 9.75 pairing | 0.39 | 1.22 | 0.61 | 1.62 | 0.45 | 0.73 | 1.57 | 1.07 | 1.68 | 0.75 | 3.22 Jon's | 0.45 | 0.99 | 0.44 | 0.98 | 1.04 | 1.44 | 1.69 | 1.46 | 1.70 | 1.40 | 3.86 braun | 6.69 | 21.46 | 6.68 | 21.57 | 2.82 | 5.23 | 6.29 | 5.21 | 6.28 | 5.17 | 7.24 smooth | 0.10 | 0.23 | 0.08 | 2.10 | 0.11 | 1.63 | 1.36 | 1.60 | 1.41 | 1.61 | 7.71 oo = heap or stack overflow 1. | smooth | smooth | smooth | Jon's | smooth | pairing | smooth | pairing | smooth | pairing | pairing 2. | pairing | Jon's | Jon's | pairing | pairing | Jon's | pairing | Jon's | pairing | Jon's | Jon's 3. | Jon's | pairing | pairing | bottom-up | bottom-up | smooth | Jon's | smooth | Jon's | smooth | braun 4. | bottom-up | bottom-up | bottom-up | smooth | Jon's | bottom-up | bottom-up | bottom-up | bottom-up | bottom-up | merge 5. | merge | merge | merge | merge | merge | merge | merge | merge | merge | merge | smooth 6. | braun | braun | braun | braun | braun | braun | braun | braun | braun | braun | bottom-up 7. | heap | | heap | | heap | heap | heap | heap | heap | heap | heap Sorting with pairing heaps is the clear winner: `pairingSort' is always among the first three, the same holds for `jonsSort' (place 4 for `==' is not really an exception). On random data they are twice as fast as any other algorithm. My version of heap sort is really bad, sigh. Perhaps surprisingly, `braunSort' works quite well on random data. NB Times were taken on a loaded Sun Ultra-1 Sparcstation (run time options -K32M -H32M, median out of five runs). If we increase the size of the input data we see that sorting with pairing heaps is also space efficient: only `pairingSort' and `jonsSort' work on all test sequences. Maybe someone wishes to verify this making heap profiles? 100000 | < | <= | > | >= | == | 1 2* | 1..100* | 2 1* | 100..1* | 1 2 2 1* | random merge | 3.15 | 9.16 | 2.83 | 8.96 | 3.18 | 6.65 | 9.60 | 6.64 | 9.46 | 6.58 | oo bottom-up | 2.18 | 7.73 | 1.99 | 7.60 | 2.17 | 4.74 | 13.01 | 4.63 | 12.51 | 4.59 | oo heap | oo | oo | oo | oo | 17.90 | oo | oo | oo | oo | oo | oo pairing | 1.20 | 2.53 | 1.58 | 3.93 | 1.28 | 2.25 | 3.93 | 2.24 | 4.07 | 2.23 | 9.87 Jon's | 0.95 | 2.73 | 0.93 | 2.90 | 2.18 | 3.00 | 3.54 | 3.06 | 3.51 | 2.78 | 8.63 braun | 21.87 | oo | 21.77 | oo | 7.66 | 14.67 | 20.93 | 14.93 | 21.00 | 14.65 | 23.01 smooth | 0.22 | 0.41 | 0.17 | 6.77 | 0.20 | 4.49 | 3.25 | 4.43 | 3.21 | 4.40 | oo oo = heap or stack overflow 1. | smooth | smooth | smooth | Jon's | smooth | pairing | smooth | pairing | smooth | pairing | Jon's 2. | Jon's | pairing | Jon's | pairing | pairing | Jon's | Jon's | Jon's | Jon's | Jon's | pairing 3. | pairing | Jon's | pairing | smooth | bottom-up | smooth | pairing | smooth | pairing | smooth | braun 4. | bottom-up | bottom-up | bottom-up | bottom-up | Jon's | bottom-up | merge | bottom-up | merge | bottom-up | 5. | merge | merge | merge | merge | merge | merge | bottom-up | merge | bottom-up | merge | 6. | braun | | braun | | braun | braun | braun | braun | braun | braun | 7. | | | | | heap | braun | | | | | Final remarks ~~~~~~~~~~~~~ We have seen that many sorting algorithms can be explained in terms of priority queues. What about quick sort and colleagues? It is well-known that quick sort is equivalent to tree sort [7, p. 26]. So again we can take an data-structural view at sorting. But that is another story to be told by ... Practitioners are probably surprised to learn that `pairingSort' is the algorithm of choice for sorting. Any objections to this recommendation? I was surprised to see that it performs so well: sorting 50.000 Int's in roughly three seconds and 100.000 Int's in roughly nine seconds is quite acceptable. A. Appendix ~~~~~~~~~~~ Here is yet another colleague of `foldr' and `foldl': `foldm' constructs a balanced expression tree. > foldm :: (a -> a -> a) -> a -> [a] -> a > foldm (*) e [] = e > foldm (*) e x = fst (rec (length x) x) > where rec 1 (a:x) = (a, x) > rec n x = (a * b, z) > where m = n `div` 2 > (a, y) = rec (n - m) x > (b, z) = rec m y Jon's `treefold'. In a sense `foldm' works top-down and `treefold' works bottom-up. > treefold :: (a -> a -> a) -> a -> [a] -> a > treefold (*) e [] = e > treefold (*) e [a] = a > treefold (*) e (a:b:x) = treefold (*) e (a * b : pairfold (*) x) > pairfold :: (a -> a -> a) -> [a] -> [a] > pairfold (*) (a:b:x) = a * b : pairfold (*) x > pairfold (*) x = x -- here `x' will have fewer than two Note that `foldm' and `treefold' construct different trees: `foldm' returns a Braun tree while `treefold' returns a tree of the form t1 * (t2 * (.. (tn-1 * tn) ..)) where the ti's are complete binary trees in decreasing size. The size of the trees corresponds to the binary decomposition of the input length. The random-number generator which is stolen from the `hbc'-library. > random2Ints :: Int -> Int -> [Int] > random2Ints s1 s2 > | s1 < 1 || s1 > 2147483562 = error "random2Ints: Bad first seed." > | s2 < 1 || s2 > 2147483398 = error "random2Ints: Bad second seed." > | otherwise = rands s1 s2 > rands :: Int -> Int -> [Int] > rands s1 s2 > | z < 1 = z + 2147483562 : rands s1'' s2'' > | otherwise = z : rands s1'' s2'' > where k = s1 `div` 53668 > s1' = 40014 * (s1 - k * 53668) - k * 12211 > s1'' | s1' < 0 = s1' + 2147483563 > | otherwise = s1' > k' = s2 `div` 52774 > s2' = 40692 * (s2 - k' * 52774) - k' * 3791 > s2'' | s2' < 0 = s2' + 2147483399 > | otherwise = s2' > z = s1'' - s2'' Some auxiliary functions. > copy :: [a] -> [a] > copy = concat . repeat > interleave :: [a] -> [a] -> [a] > interleave [] y = y > interleave (a:x) y = a : interleave y x References ~~~~~~~~~~ [1] Richard Bird and Philip Wadler. "Introduction to Functional Programming", 1988, Prentice-Hall, Series in Computer Science [2] Lawrence C Paulson. "ML for the Working Programmer", second edition, 1996, Cambridge University Press [3] Chris Okasaki. "Purely Functional Data Structures", PhD thesis, School of Computer Science, Carnegie Mellon University,1996, CMU-CS-96-177 [4] Ralf Hinze."Einf"uhrung in die funktionale Programmierung mit Miranda", 1992, Teubner Verlag [5] Michael L Fredman, Robert Sedgewick, Daniel D Sleator and Robert E Tarjan. "The pairing heap: A new form of self-adjusting heap" Algorithmica 1(1):111-129, 1986. [6] Chris Okasaki, "Functional Data Structures", The Second International Summer School on Advanced Functional Programming Techniques, 1996, LNCS 1129 [7] David J King, "Functional Programming and Graph Algorithms", PhD thesis, Department of Computer Science, University of Glasgow, 1996 [8] Richard S. Bird, "Functional algorithm design", Science of Computer Programming, 26 (1996), 15-31

- Re: heap sort or the wonder of abstraction Ralf Hinze
- Re: heap sort or the wonder of abstraction Chris Okasaki
- Re: heap sort or the wonder of abstraction Lennart Augustsson
- Re: heap sort or the wonder of abstraction Lennart Augustsson
- Re: heap sort or the wonder of abstraction Ralf Hinze