Re: heap sort or the wonder of abstraction
Lennart wrote Well, I'm a sucker for a benchmark so I ran all of these with hbc. I also added the smooth merge sort that comes with hbc. ... As you can see there is no clear winner, but I see no real reason to change the sort that comes with hbc to something else at this moment. You are right, I should clarify that the recommendations are ghc specific (in the next version ;-). sortHbc :: (Ord a) = [a] - [a] sortHbc [] = [] sortHbc (x:xs) = msort (upSeq xs [x]) upSeq [] xs = [reverse xs] upSeq (y:ys) xxs@(x:xs) = if x = y then upSeq ys (y:xxs) else reverse xxs : upSeq ys [y] msort [xs] = xs msort xss = msort (mergePairs xss) mergePairs (xs:ys:xss) = merge xs ys : mergePairs xss mergePairs xss = xss merge xxs@(x:xs) yys@(y:ys) = if x = y then x:merge xs yys else y:merge xxs ys merge [] yys = yys merge xxs[] = xxs That's a first-order version of the smooth bottom-up mergesort (which I did not include in the timings because the difference to the top-down variant was not significant). `sortHbc' is probably slightly faster than `smoothMergeSort' because its first-order? NB Bottom-up mergsort was my previous favourite ;-). Your version has the slight drawback that it uses only increasing sequences. BTW, I don't think the test program does the right thing. It prints the last element of the sorted list, but there is nothing that says that computing this forces the list to be completely sorted. When I test sort routines I always do something like printing the sum of the sorted list. Hmm. I think this does not apply to the examples I gave but you are right, it could happen. cheatSort x = [ nth i x | i - [1 .. length x] ] where `nth i x' computes the ith smallest element of x. Furthermore (while I'm in a whining mode :-), taking the median of several runs is not the accepted wisdom. You should take the minimum of several runs. Fixed. Ralf
Re: heap sort or the wonder of abstraction
--167E2781446B Content-Type: text/plain; charset="us-ascii" Ralf Hinze wrote: Practitioners are probably surprised to learn that `pairingSort' is the algorithm of choice for sorting. Any objections to this recommendation? I was surprised to see that it performs so well: sorting 50.000 Int's in roughly three seconds and 100.000 Int's in roughly nine seconds is quite acceptable. I ran some similar experiments in Standard ML a few years ago. In those experiments pairingSort also performed extremely well. The only algorithm that performed better, and even then only by a small amount, was splaySort, based on splay trees[1]. However, my experiment only considered algorithms that were good choices as heaps -- I did not consider any of the mergesort variations. Ralf, could I ask you to run my code below through your experiments (I don't have easy access to anything but hugs at the moment)? According to Ralf's criteria, splaySort is A. asymptotically optimal B. stable C. smooth (In fact, it has been conjectured that splaySort is optimal with respect to any reasonable notion of "presortedness".[2]) However, I believe--although I'm positive--that splaySort is D. not lazy Ralf considered the situation where the creation phase takes O(n) time and the selection phase takes O(n log n) time, but for splaySort these are reversed. Chris -- [1] Sleator and Tarjan "Self-adjusting binary search trees" Journal of the ACM 32(3):652-686 (July '85) [2] Moffat, Eddy, and Petersson "Splaysort: Fast, Versatile, Practical" Software PE 26(7):781-797 (July '96) - --167E2781446B Content-Disposition: inline; filename="Splay.lhs" Content-Type: text/plain; charset="us-ascii"; name="Splay.lhs" data Splay a = SEmpty | SNode (Splay a) a (Splay a) instance PriorityQueue Splay where empty = SEmpty single x = SNode SEmpty x SEmpty fromList xs = foldr insert empty xs toOrderedList t = tol t [] where tol SEmpty rest = rest tol (SNode a x b) rest = tol a (x : tol b rest) insert k t = SNode a k b where (a, b) = partition t -- elements of a = k, elements of b k partition SEmpty = (SEmpty,SEmpty) partition t@(SNode tl x tr) | x k = case tr of SEmpty - (t,SEmpty) SNode trl y trr | y k - let tl' = SNode tl x trl (lt,ge) = partition trr in (SNode tl' y lt,ge) | otherwise - let (lt,ge) = partition trl in (SNode tl x lt,SNode ge y trr) | otherwise = case tl of SEmpty - (SEmpty,t) SNode tll y tlr | y k - let (lt,ge) = partition tlr in (SNode tll y lt,SNode ge x tr) | otherwise - let tr' = SNode tlr x tr (lt,ge) = partition tll in (lt,SNode ge y tr') splaySort :: (Ord a) = [a] - [a] splaySort = toOrderedList . (fromList :: (Ord a) = [a] - Splay a) --167E2781446B--
Re: heap sort or the wonder of abstraction
Sorting is a hobby-horse of mine, so I cannot resist the temptation to elaborate on the subject. I was motivated to write this rather long reply by Carsten Kehler Holst saying `As far as I can see the difference between merge sort and heap sort as described by Jon is almost non existing'. Carsten is not quite right but he is not totally wrong either. Both sorting algorithms are based on priority queues, so it may be worthwhile to take a `data-structural look at sorting'. That's the theme of this email. There are still some open points, so any remarks, corrections, ideas etc are *welcome*. Ralf PS: Those who are interested in performance only should skip to Section 10. import List ( group ) import System ( getArgs ) 1. Introductory remarks ~~~ What makes up a good sorting algorithm? Here are some criteria: A. it should be asymptotically optimal (ie O(n log n) worst case behaviour ruling out quick sort ;-)), B. it should be stable (ie it may not change the order of equal elements), C. it should be smooth (a smooth sort has a linear execution time if the input is nearly sorted). All algorithms we are going to present are asymptotically optimal (with the notable exception of `jonsSort') and all of them are stable. Only `smoothMergeSort' has shown to be smooth (to the best of my knowledge). However, practical experiments suggest that `pairingSort' and `jonsSort' adopt quite well to the input data. Additional criteria one *may* consider: D. it should be lazy (ie `head . sort' has linear execution time), E. it should run faster if the input contains many equal elements. All algorithms are lazy. No algorithm explicitly addresses Criterion E. Again, experiments suggest that `pairingSort' and `jonsSort' adopt quite well to the input data. It is advisable to gather some test data to check the various implementations. A stable sorting algorithm should perform well on the following data. strictlyIncreasing n = [1 .. n] increasing n = interleave x x where x = strictlyIncreasing n strictlyDecreasing n = [n, n-1 .. 1] decreasing n = interleave x x where x = strictlyDecreasing n constant n= replicate n 0 The following generators produce lists containing many equal elements (provided `k n'). repIncreasing k n = take n (copy [0 .. k]) repDecreasing k n = take n (copy [k, k-1 .. 0]) oscillating k n = take n (copy ([0 .. k] ++ [k, k-1 .. 0])) Finally we have random data. random n = take n (random2Ints (2*n) (3*n)) A complete list of all generators. generators:: [Int - [Int]] generators= [ strictlyIncreasing,-- 0 increasing, strictlyDecreasing, decreasing, constant, repIncreasing 2, -- 5 repIncreasing 100, repDecreasing 2, repDecreasing 100, oscillating 2, random ] -- 10 NB We only consider lists of Int's. It may be worthwhile to repeat the benchmarks (Section 10) with data designed to make comparisons dominate, see Jon's second email. 2. Priority queues ~~ Here is the abstract data type of priority queues formulated as a Haskell class definition. data SeqView t a = Null | Cons a (t a) class PriorityQueue q where empty :: (Ord a) = q a single:: (Ord a) = a - q a insert:: (Ord a) = a - q a - q a meld :: (Ord a) = q a - q a - q a splitMin :: (Ord a) = q a - SeqView q a fromList :: (Ord a) = [a] - q a toOrderedList :: (Ord a) = q a - [a] single a = insert a empty insert a q= single a `meld` q fromList = foldm meld empty . map single toOrderedList q = case splitMin q of Null - [] Cons a q - a : toOrderedList q The function `splitMin' replaces `isEmpty', `findMin' and `deleteMin' which usually belong to the standard repertoire. The call `splitMin q' returns `Null' if `q' is empty and `Cons a q1' otherwise (`a' is a minimal element of `q' and `q1' is the remaining queue). The prototypical sorting algorithm based on priority queues looks as follows (`PQ' refers to the concrete implementation) pqSort:: (Ord a) = [a] - [a] pqSort
Re: Heap Sort
On 4 Oct , Chris Okasaki wrote: But the heapsort you give is nothing like the standard imperative heapsort! Point taken, although I think 'nothing like' is overstating the case. Yes, it uses a heap, but not the binary heap used by standard heapsort. Perfectly true. I only said that you could use my version when explaining it, not that it would be suitable for the whole explanation. What I had in mind was that this sort is easier to explain, and gets some of the concepts across. Most of the complication in imperative heapsort is to do with keeping the heaps in place in a fixed size array. Larry Paulson's "ML for the Working Programmer" includes a functional heapsort ... is probably superior for pedagogical purposes. Well, I'd certainly want to go on to describe this in our hypothetical pedagogical situation :-) Mind you, I haven't done any teaching since I got ill (which was before I wrote the sort), so I have no experimental evidence. The approach above might just prove confusing. Perhaps I should just have said that a use for it is as an answer to people who say it's difficult (or impossible) to write a heapsort functionally, which was what prompted me to write it in the first place. [1] Fredman, Sedgewick, Sleator, and Tarjan. "The pairing heap: A new form of self-adjusting heap" Algorithmica 1(1):111-129, 1986. Many thanks for this reference, of which I was unaware. Jon -- Jon Fairbairn [EMAIL PROTECTED] 18 Kimberley Road[EMAIL PROTECTED] Cambridge CB4 1HH +44 1223 570179 (pm only, please)
RE: Heap Sort
On 2 Oct , Carsten Kehler Holst wrote: Merge Sort vs. Heap Sort (ala Jon) As far as I can see the difference between merge sort and heap sort as described by Jon is almost non existing. I'm afraid you need to look a little harder :-) At first let me note that heapsort as I sent it is flawed in that it does _more_ comparisons than mergesort for small odd numbers of elements, so they are not the same! mergesort = treefold merge . map runify you mean treefold merge Nil . map runify The main difference is that we call merge recursively instead of building a heap node which is later taken apart in flatten_heap. but the heaps defer some of the comparisons, and group them together differently. I fail to see how Jon's version can be superior and I cannot see that points a, b, and c (below) holds (of course (a) can be argued) I'm afraid I'm not going to give a full explanation here, but the nub of the matter is as mentioned above. a) When you want to explain the imperative heapsort b) When you know that the data are going to be in an order that is bad for mergesort (take n . concat . repeat) [0,1] for example Here's a practical comparison (using Hugs 1.3) between your mergesort as corrected above, and my heapsort as originally sent: ? :gc Garbage collection recovered 995633 cells ? (length . mergesort . concat . take 5000 . repeat) [0,1] 1 (769128 reductions, 1260406 cells, 1 garbage collection) ? :gc Garbage collection recovered 995633 cells ? (length . heapsort . concat . take 5000 . repeat) [0,1] 1 (388900 reductions, 813056 cells) and with data designed to make comparisons dominate: ? (length . mergesort . concat . take 5000 . repeat) ["aach","aagh"] 1 (5467874 reductions, 10727534 cells, 155 garbage collections) ? (length . heapsort . concat . take 5000 . repeat) ["aach","aagh"] 1 (2236721 reductions, 4540208 cells, 67 garbage collections) ? c) when you are worried about the worst-case behaviour more than the average case As I said originally, mergesort is somewhat faster in the average case. -- Jon Fairbairn [EMAIL PROTECTED] 18 Kimberley Road[EMAIL PROTECTED] Cambridge CB4 1HH +44 1223 570179 (pm only, please)
Re: Heap Sort
[EMAIL PROTECTED] wrote: Here is my version: [...] On 21 Sep , Chris Dornan wrote: When would a heap sort be preferable to a merge sort? a) When you want to explain the imperative heapsort But the heapsort you give is nothing like the standard imperative heapsort! Yes, it uses a heap, but not the binary heap used by standard heapsort. Instead, it uses the functional equivalent of multi-pass pairing heaps[1]. Larry Paulson's "ML for the Working Programmer" includes a functional heapsort that is much closer in spirit to the standard imperative version, and so is probably superior for pedagogical purposes. (Although I expect that your version will be much faster in practice.) Chris [1] Fredman, Sedgewick, Sleator, and Tarjan. "The pairing heap: A new form of self-adjusting heap" Algorithmica 1(1):111-129, 1986.
RE: Heap Sort
Merge Sort vs. Heap Sort (ala Jon) As far as I can see the difference between merge sort and heap sort as described by Jon is almost non existing. Merge sort as I wrote it in 92 (it might have been 93 :-). Using Jon's naming: runify x = [x] merge [] ys = ys merge xs [] = xs merge xrun@(x:xs) yrun@(y:ys) | x = y = x:merge xs yrun -- = to make it stable | otherwise = y : merge xrun ys mergesort = treefold merge . map runify to make it into a natural mergesort use ( group (=) ) instead of (map runify) (group R) splits a list in sublists where the relation R holds between two consecutive elements and is due to John Hughes. The main difference is that we call merge recursively instead of building a heap node which is later taken apart in flatten_heap. I fail to se how Jon's version can be superior and I cannot see that points a, b, and c (below) holds (of course (a) can be argued) Carsten Kehler Holst -Original Message- From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]] Sent: 1. oktober 1997 21:01 To:[EMAIL PROTECTED] Cc:[EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Heap Sort Here is my version: First we need treefold: --- module Treefold where -- I'm surprised treefold isn't in the standard prelude since it's so useful -- treefold (*) z [a,b,c,d,e,f] = (((a*b)*(c*d))*(e*f)) -- translated from the version I wrote in Ponder in October 1992 -- Jon Fairbairn treefold f zero [] = zero treefold f zero [x] = x treefold f zero (a:b:l) = treefold f zero (f a b: pairfold l) where pairfold (x:y:rest) = f x y: pairfold rest pairfold l = l -- here l will have fewer than two elements -- module Heapsort where import Treefold -- translated from the version I wrote in Ponder in October 1992 -- Jon Fairbairn data Heap a = Nil | Node a [Heap a] heapify x = Node x [] heapsort :: Ord a = [a] - [a] heapsort = flatten_heap . merge_heaps . map heapify where merge_heaps :: Ord a = [Heap a] - Heap a merge_heaps = treefold merge_heap Nil flatten_heap Nil = [] flatten_heap (Node x heaps) = x:flatten_heap (merge_heaps heaps) merge_heap Nil Nil = Nil merge_heap heap@(Node _ _) Nil = heap merge_heap node_a@(Node a heaps_a) node_b@(Node b heaps_b) | a b = Node a (node_b: heaps_a) | otherwise = Node b (node_a: heaps_b) -- end On 21 Sep , Chris Dornan wrote: When would a heap sort be preferable to a merge sort? a) When you want to explain the imperative heapsort b) When you know that the data are going to be in an order that is bad for mergesort (take n . concat . repeat) [0,1] for example c) when you are worried about the worst-case behaviour more than the average case Most of the time, mergesort with a phase that breaks the input into runs will be faster. However, it's possible to write a heapsort that starts with a run phase; this is slightly slower than mergesort for the average case, but the worst case is better. Jon -- Jon Fairbairn [EMAIL PROTECTED] 18 Kimberley Road[EMAIL PROTECTED] Cambridge CB4 1HH +44 1223 570179 (pm only, please)
Re: Heap sort
On 19 Sep, Nicholas Bleakly wrote: Does any body have a heap sort algorithm (i.e. takes a single unsorted list and applies a heap sort to it)? If you mean a functional one, I have. I could email it to you. Or post it here if wanted. Does anyone else have one? -- Jon Fairbairn [EMAIL PROTECTED]