sorting

Chris Dornan Thu, 9 Oct 1997 17:27:59 +0100 (BST)

On Wed, 8 Oct 1997, Ralf Hinze <[EMAIL PROTECTED]>:

    Sorting is a hobby-horse of mine, so I cannot resist the temptation to
    elaborate on the subject....

    [sort package omitted]

On Wed, 08 Oct 1997, Chris Okasaki <[EMAIL PROTECTED]>:

    I ran some similar experiments in Standard ML a few years ago.  In those
    experiments pairingSort also performed extremely well.  The only 
    algorithm that performed better, and even then only by a small amount,
    was splaySort...

    [`splaySort' omitted]

On Thu, 9 Oct 1997, Lennart Augustsson <[EMAIL PROTECTED]>:

    Well, I'm a sucker for a benchmark so I ran all of these with hbc.
    I also added the smooth merge sort that comes with hbc...

    [`sortHbc' omitted.]

Here are my results (best viewed with a wide window):

              <      <=     >      >=     =      >2     >100   <2     <100   ><2    
rand
pairingSort | 0.71 | 1.42 | 0.99 | 1.97 | 0.77 | 1.20 | 2.02 | 1.23 | 2.22 | 1.19 | 
4.32 |
jonsSort    | 0.64 | 1.32 | 0.65 | 1.33 | 1.42 | 1.92 | 2.30 | 1.93 | 2.27 | 1.83 | 
4.68 |
splaySort   | 0.58 | 1.17 | 0.59 | 1.65 | 0.58 | 1.20 | 2.10 | 1.26 | 2.02 | 1.11 | 
5.01 |
sortHbc     | 0.18 | 0.38 | 1.15 | 2.00 | 0.19 | 1.49 | 1.03 | 1.61 | 1.78 | 1.45 | 
2.41 |
nmsort      | 0.25 | 0.52 | 1.08 | 2.05 | 0.25 | 1.52 | 1.10 | 1.62 | 1.67 | 1.48 | 
2.31 |

(These results were gathered with Ralph's PQSort.lhs module compiled with `ghc
-O' on a quiet UltraSparc, with 128M of RAM, running Solaris 2.5; see the
attached shell script which takes the minimum over 10 runs for each benchmark.)

`nmsort' is Bob Buckley's ([EMAIL PROTECTED]) natural merge sort (see
Knuth Vol. 2).  I have attached it to the message.

`nmsort' is essentially the same as Lenart's merge sort, but with a structure
more like Jon's heap sort -- in fact `foldb' is a synthesis of an almost
identical combinator used by Bob Buckley and Jon's `foldtree' function.  The
similarity of `nmsort' and `sortHbc' is reflected by the fact that they perform
almost identically, with `sortHbc' performing slightly better on ordered inputs
and `nmsort' edging it on random data.  Both of the sorts lose out on inputs
that contain few natural runs (as Jon explained to me in his earlier message);
for the run-challenged inputs, (`>',`>=',`>2', `<2' and `><2': the descending
lists and the two cycle lists) the other sorts win out, with `splaySort'
performing strongly.

However, the merge sorts tend to get closer when they are defeated.  Comparing
`splaySort' and `sortHbc' in the above data set we get the following (where a
`+' indicates a win for `sortHbc' and `-' a win for `splaySort'):

        <      <=     >      >=     =      >2     >100   <2     <100   ><2    rand
        +3.2   +3.1   -2.0   -1.2   +3.1   -1.3   +2.0   -1.3   +1.1   -1.3   +2.1

Apart from the -2.0 result on the sorted input, the merge sorts have a lower
bound of -1.3.  However, on ascending outputs they all exceed 3.0 with the
random input getting a score of 2.1.

Which data set is the most important is subjective but I would be looking for
the best results on the sorted and random inputs.  I would pay attention to the
random inputs because I think of it as representing the `average' case (this
justification is admittedly a bit wooly).  I regard the the sorted inputs as an
important special case because it allows me to enumerate ordered data
structures into lists, process them and rebuild the data structure again,
quickly; the first thing to be done in rebuilding such a data structure is to
sort it, but that's OK because my sorting algorithm degenerates nicely into
little more than a quick linear check that the list is ordered.

In summary, I am with Lennart: a naturally balanced merge sort seems pretty
good.

Cheers,

Chris Dornan                    [EMAIL PROTECTED]
University College Cork         +353 21 903165

PS: Isn't it pleasing that good functional sorts have the following atributes,
    all at the same time:

        a) they work well in a wide variety of situations

        b) they are worst case O(n.log(n)) on all inputs

        c) they are worst case O(n) on sorted inputs

        d) they are completely general purpose

        e) and, of course, they are rather compact and elegant.

PPS: Isn't it nice to have a stable programming language with good compilers
and interpretters to support it.


-- nmsort 
-----------------------------------------------------------------------------------

> nmsort :: Ord a => [a] -> [a]
> nmsort = nmsortBy compare

> nmsortBy :: (a->a->Ordering) -> [a] -> [a]
> nmsortBy cmp xs = foldb (merge0 cmp) [] (runs0 cmp xs)

> runs0 :: (a->a->Ordering) -> [a] -> [[a]]
> runs0 cmp xs = foldr op [] xs
>       where
>       op z xss@(xs@(x:_):xss') | le z x = (z:xs):xss'
>       op z xss                          = [z]:xss
>
>       le z x = case cmp z x of { GT -> False; _ -> True }

> merge0:: (a->a->Ordering) -> [a] -> [a] -> [a]
> merge0 cmp [] l = l
> merge0 cmp l@(_:_) [] = l
> merge0 cmp l1@(h1:t1) l2@(h2:t2) =
>       case cmp h1 h2 of
>         GT -> h2:merge0 cmp l1 t2
>         _  -> h1:merge0 cmp t1 l2

> foldb :: (a->a->a) -> a -> [a] -> a
> foldb f zero []  = zero
> foldb f zero [x] = x
> foldb f zero xs  = foldb f zero (fold xs)
>     where
>       fold (x1:x2:xs) = f x1 x2 : fold xs
>       fold xs         = xs

-- bench shell script 
-----------------------------------------------------------------------
#!/bin/sh

if [ $# != 1 ]; then
        echo usage: $0 sort-algorithm
        exit 1
fi

tmp=/tmp/out$$

sort ()
{
  time -p pqsort $1 $2 50000 +RTS -K16M -H64M >/dev/null 2>$tmp || \
                { cat $tmp; rm $tmp; echo; echo "$0: terminated"; exit 1; }
  result=`sed -n -e 's/real  *\([0-9.]*\)/\1/p' $tmp`
}

echo "$1: |\c"
for i in 0 1 2 3 4 5 6 7 8 9 10; do
  min=1000.0
  for x in 1 2 3 4 5 6 7 8 9 10; do
      sort $1 $i
      lt=`echo "if ($result<$min) 1; if ($result>=$min) 0"|bc`
      if [ $lt = 1 ]; then
         min=$result
      fi
  done
  echo " $result |\c"
done
echo
exit 0
sorting

Reply via email to