Re: heap sort or the wonder of abstraction

1997-10-09 Thread Ralf Hinze

Lennart wrote

 Well, I'm a sucker for a benchmark so I ran all of these with hbc.
 I also added the smooth merge sort that comes with hbc.
 ...
 As you can see there is no clear winner, but I see no real reason
 to change the sort that comes with hbc to something else at this
 moment.

You are right, I should clarify that the recommendations are ghc
specific (in the next version ;-).

  sortHbc :: (Ord a) = [a] - [a]
  sortHbc [] = []
  sortHbc (x:xs) = msort (upSeq xs [x])
  
  upSeq [] xs = [reverse xs]
  upSeq (y:ys) xxs@(x:xs) =
  if x = y then
  upSeq ys (y:xxs)
  else
  reverse xxs : upSeq ys [y]
  
  msort [xs] = xs
  msort xss = msort (mergePairs xss)
  
  mergePairs (xs:ys:xss) = merge xs ys : mergePairs xss
  mergePairs xss = xss
  
  merge xxs@(x:xs) yys@(y:ys) =
  if x = y then
  x:merge xs yys
  else
  y:merge xxs ys
  merge [] yys = yys
  merge xxs[]  = xxs

That's a first-order version of the smooth bottom-up mergesort (which I
did not include in the timings because the difference to the top-down
variant was not significant). `sortHbc' is probably slightly faster
than `smoothMergeSort' because its first-order? NB Bottom-up mergsort
was my previous favourite ;-). Your version has the slight drawback
that it uses only increasing sequences.

 BTW, I don't think the test program does the right thing.  It prints 
 the last element of the sorted list, but there is nothing that
 says that computing this forces the list to be completely sorted.
 When I test sort routines I always do something like printing the
 sum of the sorted list.

Hmm. I think this does not apply to the examples I gave but you are
right, it could happen.

  cheatSort x   =  [ nth i x | i - [1 .. length x] ]

where `nth i x' computes the ith smallest element of x.

 Furthermore (while I'm in a whining mode :-), taking the median
 of several runs is not the accepted wisdom.  You should take the
 minimum of several runs.

Fixed.

Ralf





Re: heap sort or the wonder of abstraction

1997-10-08 Thread Chris Okasaki

--167E2781446B
Content-Type: text/plain; charset="us-ascii"

Ralf Hinze wrote:
 Practitioners are probably surprised to learn that `pairingSort' is the
 algorithm of choice for sorting. Any objections to this recommendation?
 I was surprised to see that it performs so well: sorting 50.000 Int's
 in roughly three seconds and 100.000 Int's in roughly nine seconds is
 quite acceptable.

I ran some similar experiments in Standard ML a few years ago.  In those
experiments pairingSort also performed extremely well.  The only 
algorithm that performed better, and even then only by a small amount,
was splaySort, based on splay trees[1].  However, my experiment
only considered algorithms that were good choices as heaps -- I
did not consider any of the mergesort variations.  Ralf, could I
ask you to run my code below through your experiments (I don't have
easy access to anything but hugs at the moment)?

According to Ralf's criteria, splaySort is
  A. asymptotically optimal
  B. stable
  C. smooth  (In fact, it has been conjectured that splaySort is
  optimal with respect to any reasonable notion of
  "presortedness".[2])
However, I believe--although I'm positive--that splaySort is
  D. not lazy
Ralf considered the situation where the creation phase takes O(n) time
and the selection phase takes O(n log n) time, but for splaySort these
are reversed.

Chris

--

[1] Sleator and Tarjan
"Self-adjusting binary search trees"
Journal of the ACM 32(3):652-686 (July '85)

[2] Moffat, Eddy, and Petersson
"Splaysort: Fast, Versatile, Practical"
Software PE 26(7):781-797 (July '96)

-

--167E2781446B
Content-Disposition: inline; filename="Splay.lhs"
Content-Type: text/plain; charset="us-ascii"; name="Splay.lhs"

 data Splay a = SEmpty | SNode (Splay a) a (Splay a)

 instance PriorityQueue Splay where
  empty = SEmpty
  single x = SNode SEmpty x SEmpty

  fromList xs = foldr insert empty xs
  
  toOrderedList t = tol t []
where tol SEmpty rest = rest
  tol (SNode a x b) rest = tol a (x : tol b rest)

  insert k t = SNode a k b
where 
  (a, b) = partition t  -- elements of a = k, elements of b  k

  partition SEmpty = (SEmpty,SEmpty)
  partition t@(SNode tl x tr)
| x  k =
case tr of
  SEmpty - (t,SEmpty)
  SNode trl y trr
| y  k -
let tl' = SNode tl x trl
(lt,ge) = partition trr
in (SNode tl' y lt,ge)
| otherwise -
let (lt,ge) = partition trl
in (SNode tl x lt,SNode ge y trr)
| otherwise =
case tl of
  SEmpty - (SEmpty,t)
  SNode tll y tlr
| y  k -
let (lt,ge) = partition tlr
in (SNode tll y lt,SNode ge x tr)
| otherwise -
let tr' = SNode tlr x tr
(lt,ge) = partition tll
in (lt,SNode ge y tr')

 splaySort :: (Ord a) = [a] - [a]
 splaySort =  toOrderedList
   .  (fromList :: (Ord a) = [a] - Splay a)

--167E2781446B--





Re: heap sort or the wonder of abstraction

1997-10-08 Thread Ralf Hinze

Sorting is a hobby-horse of mine, so I cannot resist the temptation to
elaborate on the subject. I was motivated to write this rather long
reply by Carsten Kehler Holst saying `As far as I can see the
difference between merge sort and heap sort as described by Jon is
almost non existing'. Carsten is not quite right but he is not totally
wrong either. Both sorting algorithms are based on priority queues, so
it may be worthwhile to take a `data-structural look at sorting'.
That's the theme of this email.

There are still some open points, so any remarks, corrections, ideas
etc are *welcome*.

Ralf

PS: Those who are interested in performance only should skip to Section 10.

 import List   (  group  )
 import System (  getArgs  )


1. Introductory remarks
~~~

What makes up a good sorting algorithm? Here are some criteria:

A. it should be asymptotically optimal (ie O(n log n) worst case behaviour
   ruling out quick sort ;-)),

B. it should be stable (ie it may not change the order of equal elements),

C. it should be smooth (a smooth sort has a linear execution time if the
   input is nearly sorted).

All algorithms we are going to present are asymptotically optimal (with
the notable exception of `jonsSort') and all of them are stable. Only
`smoothMergeSort' has shown to be smooth (to the best of my
knowledge).  However, practical experiments suggest that `pairingSort'
and `jonsSort' adopt quite well to the input data.

Additional criteria one *may* consider:

D. it should be lazy (ie `head . sort' has linear execution time),

E. it should run faster if the input contains many equal elements.

All algorithms are lazy. No algorithm explicitly addresses Criterion E.
Again, experiments suggest that `pairingSort' and `jonsSort' adopt
quite well to the input data.


It is advisable to gather some test data to check the various
implementations. A stable sorting algorithm should perform well on the
following data.

 strictlyIncreasing n  =  [1 .. n]
 increasing n  =  interleave x x
 where x   =  strictlyIncreasing n
 strictlyDecreasing n  =  [n, n-1 .. 1]
 decreasing n  =  interleave x x
 where x   =  strictlyDecreasing n
 constant n=  replicate n 0

The following generators produce lists containing many equal elements
(provided `k  n').

 repIncreasing k n =  take n (copy [0 .. k])
 repDecreasing k n =  take n (copy [k, k-1 .. 0])
 oscillating k n   =  take n (copy ([0 .. k] ++ [k, k-1 .. 0]))

Finally we have random data.

 random n  =  take n (random2Ints (2*n) (3*n))

A complete list of all generators.

 generators:: [Int - [Int]]
 generators=  [ strictlyIncreasing,-- 0
increasing,
strictlyDecreasing,
decreasing,
constant,
repIncreasing 2,   -- 5
repIncreasing 100,
repDecreasing 2,
repDecreasing 100,
oscillating 2,
random ]   -- 10

NB We only consider lists of Int's. It may be worthwhile to repeat the
benchmarks (Section 10) with data designed to make comparisons
dominate, see Jon's second email.


2. Priority queues
~~

Here is the abstract data type of priority queues formulated as a
Haskell class definition.

 data SeqView t a  =  Null
   |  Cons a (t a)

 class PriorityQueue q where
 empty :: (Ord a) = q a
 single:: (Ord a) = a - q a
 insert:: (Ord a) = a - q a - q a
 meld  :: (Ord a) = q a - q a - q a
 splitMin  :: (Ord a) = q a - SeqView q a
 fromList  :: (Ord a) = [a] - q a
 toOrderedList :: (Ord a) = q a - [a]

 single a  =  insert a empty
 insert a q=  single a `meld` q
 fromList  =  foldm meld empty . map single
 toOrderedList q   =  case splitMin q of
 Null  - []
 Cons a q  - a : toOrderedList q

The function `splitMin' replaces `isEmpty', `findMin' and `deleteMin'
which usually belong to the standard repertoire. The call `splitMin q'
returns `Null' if `q' is empty and `Cons a q1' otherwise (`a' is a
minimal element of `q' and `q1' is the remaining queue).

The prototypical sorting algorithm based on priority queues looks as
follows (`PQ' refers to the concrete implementation)

  pqSort:: (Ord a) = [a] - [a]
  pqSort

Re: Heap Sort

1997-10-07 Thread Jon . Fairbairn

On  4 Oct , Chris Okasaki wrote:
 But the heapsort you give is nothing like the standard imperative
 heapsort! 

Point taken, although I think 'nothing like' is overstating the case.

 Yes, it uses a heap, but not the binary heap used by standard
 heapsort.

Perfectly true. I only said that you could use my version when
explaining it, not that it would be suitable for the whole
explanation.  What I had in mind was that this sort is easier to
explain, and gets some of the concepts across.  Most of the
complication in imperative heapsort is to do with keeping the heaps in
place in a fixed size array.

 Larry Paulson's "ML for the Working Programmer" includes a
 functional heapsort ... is probably superior for pedagogical
 purposes.

Well, I'd certainly want to go on to describe this in our hypothetical
pedagogical situation :-) Mind you, I haven't done any teaching since
I got ill (which was before I wrote the sort), so I have no
experimental evidence.  The approach above might just prove confusing.

Perhaps I should just have said that a use for it is as an answer to
people who say it's difficult (or impossible) to write a heapsort
functionally, which was what prompted me to write it in the first
place.

 [1] Fredman, Sedgewick, Sleator, and Tarjan.
 "The pairing heap: A new form of self-adjusting heap"
 Algorithmica 1(1):111-129, 1986.

Many thanks for this reference, of which I was unaware.

  Jon

-- 
Jon Fairbairn [EMAIL PROTECTED]
18 Kimberley Road[EMAIL PROTECTED]
Cambridge CB4 1HH  +44 1223 570179 (pm only, please)







RE: Heap Sort

1997-10-04 Thread Jon . Fairbairn

On  2 Oct , Carsten Kehler Holst wrote:
 Merge Sort vs. Heap Sort (ala Jon)

 As far as I can see the difference between merge sort and heap sort as
 described by Jon is almost non existing.

I'm afraid you need to look a little harder :-)

At first let me note that heapsort as I sent it is flawed in that it
does _more_ comparisons than mergesort for small odd numbers of
elements, so they are not the same!

 mergesort = treefold merge . map runify

you mean treefold merge Nil . map runify

 The main difference is that we call merge recursively instead of
 building a heap node which is later taken apart in flatten_heap.

but the heaps defer some of the comparisons, and group them together
differently.

 I fail to see how Jon's version can be superior and I cannot see
 that points a, b, and c (below) holds (of course (a) can be argued)

I'm afraid I'm not going to give a full explanation here, but the nub
of the matter is as mentioned above.

a) When you want to explain the imperative heapsort
b) When you know that the data are going to be in an order that is bad for
   mergesort (take n . concat . repeat) [0,1] for example

Here's a practical comparison (using Hugs 1.3) between your mergesort
as corrected above, and my heapsort as originally sent:

? :gc
Garbage collection recovered 995633 cells
?  (length . mergesort . concat . take 5000 . repeat) [0,1]
1
(769128 reductions, 1260406 cells, 1 garbage collection)
? :gc
Garbage collection recovered 995633 cells
?  (length . heapsort . concat . take 5000 . repeat) [0,1]
1
(388900 reductions, 813056 cells)

and with data designed to make comparisons dominate:

? (length . mergesort . concat . take 5000 . repeat) ["aach","aagh"]
1
(5467874 reductions, 10727534 cells, 155 garbage collections)
? (length . heapsort . concat . take 5000 . repeat) ["aach","aagh"]
1
(2236721 reductions, 4540208 cells, 67 garbage collections)
? 

c) when you are worried about the worst-case behaviour more than the average
   case

As I said originally, mergesort is somewhat faster in the average case.

-- 
Jon Fairbairn [EMAIL PROTECTED]
18 Kimberley Road[EMAIL PROTECTED]
Cambridge CB4 1HH  +44 1223 570179 (pm only, please)







Re: Heap Sort

1997-10-04 Thread Chris Okasaki

[EMAIL PROTECTED] wrote:
 Here is my version:
   [...]

 On 21 Sep , Chris Dornan wrote:
  When would a heap sort be preferable to a merge sort?
 
 a) When you want to explain the imperative heapsort

But the heapsort you give is nothing like the standard imperative
heapsort!  Yes, it uses a heap, but not the binary heap used by
standard heapsort.  Instead, it uses the functional equivalent 
of multi-pass pairing heaps[1].  Larry Paulson's "ML for the
Working Programmer" includes a functional heapsort that is
much closer in spirit to the standard imperative version, and
so is probably superior for pedagogical purposes.  (Although I expect
that your version will be much faster in practice.)

Chris

[1] Fredman, Sedgewick, Sleator, and Tarjan.
"The pairing heap: A new form of self-adjusting heap"
Algorithmica 1(1):111-129, 1986.





RE: Heap Sort

1997-10-02 Thread Carsten Kehler Holst

Merge Sort vs. Heap Sort (ala Jon)

As far as I can see the difference between merge sort and heap sort as
described by Jon is almost non existing.

Merge sort as I wrote it in 92 (it might have been 93 :-).
Using Jon's naming:

runify x = [x]

merge [] ys = ys
merge xs [] = xs
merge xrun@(x:xs) yrun@(y:ys)
  | x = y = x:merge xs yrun   -- = to make it stable
  | otherwise = y : merge xrun ys

mergesort = treefold merge . map runify

to make it into a natural mergesort use 
( group (=) ) instead of (map runify)
(group R) splits a list in sublists where the relation R holds between
two consecutive elements and is due to John Hughes. 

The main difference is that we call merge recursively instead of
building a heap node which is later taken apart in flatten_heap. I fail
to se how Jon's version can be superior and I cannot see that points a,
b, and c (below) holds (of course (a) can be argued)

Carsten Kehler Holst

-Original Message-
From:  [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]]
Sent:  1. oktober 1997 21:01
To:[EMAIL PROTECTED]
Cc:[EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject:   Re: Heap Sort

Here is my version:

First we need treefold:

---
module Treefold where

-- I'm surprised treefold isn't in the standard prelude since it's so useful
-- treefold (*) z [a,b,c,d,e,f] = (((a*b)*(c*d))*(e*f))

-- translated from the version I wrote in Ponder in October 1992
-- Jon Fairbairn

treefold f zero [] = zero
treefold f zero [x] = x
treefold f zero (a:b:l) = treefold f zero (f a b: pairfold l)
 where pairfold (x:y:rest) = f x y: pairfold rest
   pairfold l = l -- here l will have fewer than two
elements

--

module Heapsort where
import Treefold

-- translated from the version I wrote in Ponder in October 1992
-- Jon Fairbairn

data Heap a = Nil | Node a [Heap a]
heapify x = Node x []

heapsort :: Ord a = [a] - [a]

heapsort = flatten_heap . merge_heaps . map heapify

where merge_heaps :: Ord a = [Heap a] - Heap a
  merge_heaps = treefold merge_heap Nil

  flatten_heap Nil = []
  flatten_heap (Node x heaps) = x:flatten_heap (merge_heaps heaps)

  merge_heap Nil Nil = Nil
  merge_heap heap@(Node _ _) Nil = heap
  merge_heap node_a@(Node a heaps_a) node_b@(Node b heaps_b)
 | a  b = Node a (node_b: heaps_a)
 | otherwise = Node b (node_a: heaps_b)

-- end


On 21 Sep , Chris Dornan wrote:
 When would a heap sort be preferable to a merge sort?

a) When you want to explain the imperative heapsort
b) When you know that the data are going to be in an order that is bad for
   mergesort (take n . concat . repeat) [0,1] for example
c) when you are worried about the worst-case behaviour more than the average
   case

Most of the time, mergesort with a phase that breaks the input into
runs will be faster.  However, it's possible to write a heapsort that
starts with a run phase; this is slightly slower than mergesort for
the average case, but the worst case is better.

  Jon

-- 
Jon Fairbairn [EMAIL PROTECTED]
18 Kimberley Road[EMAIL PROTECTED]
Cambridge CB4 1HH  +44 1223 570179 (pm only, please)









Re: Heap sort

1997-09-20 Thread Jon . Fairbairn

On 19 Sep, Nicholas Bleakly wrote:
 Does any body have a heap sort algorithm (i.e. takes a single unsorted
 list and applies a heap sort to it)?

If you mean a functional one, I have.  I could email it to you. Or
post it here if wanted.  Does anyone else have one?

-- 
Jon Fairbairn [EMAIL PROTECTED]