Hello All,

I'd like to share with you new Dual-Pivot Quicksort which is
faster than the known implementations (theoretically and
experimental). I'd like to propose to replace the JDK's
Quicksort implementation by new one.

Description
-----------
The classical Quicksort algorithm uses the following scheme:

1. Pick an element P, called a pivot, from the array.
2. Reorder the array so that all elements, which are less than
   the pivot, come before the pivot and all elements greater than
   the pivot come after it (equal values can go either way). After
   this partitioning, the pivot element is in its final position.
3. Recursively sort the sub-array of lesser elements and the
   sub-array of greater elements.

The invariant of classical Quicksort is:

[ <= p | >= p ]

There are several modifications of the schema:

[ < p | = p | > p ]  or  [ = p | < p | > p | = p ]

But all of them use *one* pivot.

The new Dual-Pivot Quicksort uses *two* pivots elements in this manner:

1. Pick an elements P1, P2, called pivots from the array.
2. Assume that P1 <= P2, otherwise swap it.
3. Reorder the array into three parts: those less than the smaller
   pivot, those larger than the larger pivot, and in between are
   those elements between (or equal to) the two pivots.
4. Recursively sort the sub-arrays.

The invariant of the Dual-Pivot Quicksort is:

[ < P1 | P1 <= & <= P2 } > P2 ]

The new Quicksort is faster than current implementation of Quicksort
in JDK (L. Bentley and M. Douglas McIlroy) and classical Quicksort.

The full description of the Dual-Pivot Quicksort you can find
on my page: http://iaroslavski.narod.ru/quicksort

Performance tests
-----------------
Here is result of running on different types of input data:

Client VM                          all    85%  organ  0..1          0..4
             random ascend descend equal  equal pipes random 010101 random
Dual-Pivot:  16.83  5.31    5.47   0.35  0.68  10.59  1.06   1.02   2.18
  Bentley's:  19.77  9.08   10.13   0.63  1.12  13.22  1.63   1.08   2.49

Server VM                          all    85%  organ  0..1          0..4
             random ascend descend equal  equal pipes random 010101 random
Dual-Pivot:  23.94  6.68    6.63   0.43  0.62  17.14  1.42   1.96   3.41
  Bentley's:  25.20 10.18   10.32   2.07  1.33  16.72  2.95   1.82   3.39

The a lot of other tests have been run under client and server mode.
The most interesting is BentleyBasher test framework. It runs battery
of tests for all cases:

{ 100, 1000, 10000, 1000000 } x
{ sawtooth, rand, stagger, plateau, shuffle } x
{ ident, reverse, reverse_front, reverse_back, sort, dither}

where

100, ... , 1000000 - array length

sawtooth: x[i] =i%m
rand: x[i] = rand() % m
stagger: x[i] = (i*m + i) % n
plateau: x[i] = min(i, m)
shuffle: x[i] = rand()%m? (j+=2): (k+=2)

ident(x) - a copy of x
reverse(x, 0, n) - reversed copy
reverse_front(x, 0, n/2) - front half reversed
reverse_back(x, n/2, n) - back half reversed
sort(x) - an ordered copy
dither(x) - add i%5 to x[i]

Here is the result of execution:
Server VM: 
http://spreadsheets.google.com/pub?key=t_EAWUkQ4mD3BIbOv8Fa-AQ&output=html
Client VM: 
http://spreadsheets.google.com/pub?key=tdiMo8xleTxd23nKUObcz0Q&single=true&gid=0&output=html

Mathematical investigations
---------------------------
It is proved that for the Dual-Pivot Quicksort the average number of
comparisons is 2*n*ln(n), the average number of swaps is 0.8*n*ln(n),
whereas classical Quicksort algorithm has 2*n*ln(n) and 1*n*ln(n)
respectively. Full mathematical proof see in attached proof.txt
and proof_add.txt files. Theoretical results are also confirmed
by experimental counting of the operations.

Diff between current and new implementation of Quicksort
--------------------------------------------------------

Here is the link to the diff for java.util.Arrays class:
http://cr.openjdk.java.net/~alanb/6880672/webrev.00

If you like to look and play with new algorithm,
please, take attached class DualPivotQuicksort.java

Feedback
--------

Also I'd like to share a feedback from Joshua Bloch and
Jon Bentley who spent a lot of time investigating this
algorithm, who gave me many advices and tips how to
make new Quicksort better.

-------- Original Message --------
Subject: Re: Integration of new Dual-Pivot Quicksort into JDK 7
Date: Thu, 10 Sep 2009 07:20:11 -0700
From: Joshua Bloch <j...@google.com>

Jon also says that Vladimir should make every reasonable improvement to
the basic method before checking in the code. In his words, "It would be
horrible to put the new code into the library, and then have someone
else come along and speed it up by another 20% by using standard
techniques." I believe it's not unlikely that this code may end up
getting ported to many languages and widely deployed in much the manner
of Bentley and McIlroy's fine sort (which is nearing 20 successful years
in the field). Jon will help Vladimir do this.

-------- Original Message --------
Subject: Dual-Pivot Quicksort: Next Steps
Date: Wed, 09 Sep 2009 15:02:25 -0400
From: Jon Bentley <jbent...@avaya.com>

Vladimir, Josh,
     I *finally* feel like I understand what is going on.  Now that I
(think that) I see it, it seems straightforward and obvious.
     Tony Hoare developed Quicksort in the early 1960s. I was very
proud to make minor contributions to a particularly clean (binary)
quicksort in the mid 1980s, to a relatively straightforward, industrial
strength Quicksort with McIlroy in the early 1990s, and then to
algorithms and data structures for strings with Sedgewick in the mid 1990s.
     I think that Vladimir's contributions to Quicksort go way beyond
anything that I've ever done, and rank up there with Hoare's original
design and Sedgewick's analysis. I feel so privileged to play a very,
very minor role in helping Vladimir with the most excellent work!

-----------------------------------------------

Let me know, if you have any questions/comments.

Thank you,
Vladimir
Here is the details between steps (1) and (2) in the case of Dual-Pivot 
Quicksort:
-----------------------------------------------------------------------------

>From the algorithm above, the average number of comparisons C_n as a
function of the number of elements may be represented by the equation:

(1)  C_n = 1 + 2/(n*(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {C_i + 1*i + 
C_{j-i-1} + 2*(j-i- 1) + C_{n-j-1} + 2*(n-j-1)}

Equation (1) means that total number is the sum of the comparison numbers of
all cases of partitions into 3 parts plus number of comparisons for elements
from left part (one comparison), center and right parts (2 comparisons).

!!! It can be rewritten in other way: !!!

(2)  C_n = 1 + R*(n-2) + 2/(n(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {C_i + 
C_{j-i-1} + C_{n-j-1}}

where R is the average number of comparisons during one iteration. This
constant R can be found as average value (1 + 2 + 2) / 3 = 5/3 ~ 1.6666.

-----------------------------------------------------------------------------

How it can be rewritten:

We should show that 2/(n*(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {i + 
2*(j-i- 1) + 2*(n-j-1)} equals
to 5/3 * (n-2). Let's consider the double sum:

sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {i + 2*(j-i- 1) + 2*(n-j-1)} = 

= sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {2*n - 4} - sum_{i=0}^{n-2} 
sum_{j=i+1}^{n-1} {i} =

= 2*(n-2)*sum_{i=0}^{n-2} {n-1-i} - sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {i} =

= 2*(n-2)*((n-1)^2 - (n-1)*(n-2)/2) - (n-1)*sum_{i=0}^{n-2} {i} + 
sum_{i=0}^{n-2} {i*i} =

------------------------------------------------------------------
here we use the property: sum_{k=1}{n} {k^2} = n^3/3 + n^2/2 + n/6
------------------------------------------------------------------

= 2*(n-2)*((n-1)^2 - (n-1)*(n-2)/2) - (n-1)*(n-1)*(n-2)/2 + (n-2)^3/3 + 
(n-2)^2/2 + (n-2)/6 =

= 1/6 * (12*(n-1)^2*(n-2) - 6*(n-1)*(n-2)^2 - 3*(n-1)^2*(n-2) + 2*(n-2)^3 + 
3*(n-2)^2 + (n-2)) =

= 1/6 * (3*(n-1)*(n-2)*(3*(n-1) - 2*(n-2)) + (n-2)*(2*(n-2)^2 + 3*(n-2) + 1))) =

= 1/6 * (3*(n-1)*(n-2)*(n+1) + (n-2)*(2*n^2 - 5*n + 3)) =

= 1/6 * (n-2)*(5*n^2 - 5*n) = 5/6 * n*(n-1)*(n-2)

Substitute the result into equation (1):

(1.1)  C_n = 1 + 2/(n*(n-1)) * 5/6 * n*(n-1)*(n-2) + 2/(n*(n-1)) * 
sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {C_i + C_{j-i-1} + C_{n-j-1}}

or

(1.2)  C_n = 1 + 5/3*(n-2) + 2/(n*(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} 
{C_i + C_{j-i-1} + C_{n-j-1}}

We see that (1.2) is same as (2).

/**
 * @author Vladimir Yaroslavskiy
 * @version 2009.09.10 m765
 */
public class DualPivotQuicksort {

    public static void sort(int[] a) {
        sort(a, 0, a.length);
    }

    public static void sort(int[] a, int fromIndex, int toIndex) {
        rangeCheck(a.length, fromIndex, toIndex);
        dualPivotQuicksort(a, fromIndex, toIndex - 1, 3);
    }

    private static void rangeCheck(int length, int fromIndex, int toIndex) {
        if (fromIndex > toIndex) {
            throw new IllegalArgumentException("fromIndex(" + fromIndex + ") > 
toIndex(" + toIndex + ")");
        }
        if (fromIndex < 0) {
            throw new ArrayIndexOutOfBoundsException(fromIndex);
        }
        if (toIndex > length) {
            throw new ArrayIndexOutOfBoundsException(toIndex);
        }
    }

    private static void swap(int[] a, int i, int j) {
        int temp = a[i];
        a[i] = a[j];
        a[j] = temp;
    }

    private static void dualPivotQuicksort(int[] a, int left, int right, int 
div) {
        int len = right - left;

        if (len < 27) { // insertion sort for tiny array
            for (int i = left + 1; i <= right; i++) {
                for (int j = i; j > left && a[j] < a[j - 1]; j--) {
                    swap(a, j, j - 1);
                }
            }
            return;
        }
        int third = len / div;

        // "medians"
        int m1 = left  + third;
        int m2 = right - third;

        if (m1 <= left) {
            m1 = left + 1;
        }
        if (m2 >= right) {
            m2 = right - 1;
        }
        if (a[m1] < a[m2]) {
            swap(a, m1, left);
            swap(a, m2, right);
        }
        else {
            swap(a, m1, right);
            swap(a, m2, left);
        }
        // pivots
        int pivot1 = a[left];
        int pivot2 = a[right];

        // pointers
        int less  = left  + 1;
        int great = right - 1;

        // sorting
        for (int k = less; k <= great; k++) {
            if (a[k] < pivot1) {
                swap(a, k, less++);
            } 
            else if (a[k] > pivot2) {
                while (k < great && a[great] > pivot2) {
                    great--;
                }
                swap(a, k, great--);

                if (a[k] < pivot1) {
                    swap(a, k, less++);
                }
            }
        }
        // swaps
        int dist = great - less;

        if (dist < 13) {
           div++;
        }
        swap(a, less  - 1, left);
        swap(a, great + 1, right);

        // subarrays
        dualPivotQuicksort(a, left,   less - 2, div);
        dualPivotQuicksort(a, great + 2, right, div);

        // equal elements
        if (dist > len - 13 && pivot1 != pivot2) {
            for (int k = less; k <= great; k++) {
                if (a[k] == pivot1) {
                    swap(a, k, less++);
                }
                else if (a[k] == pivot2) {
                    swap(a, k, great--);
    
                    if (a[k] == pivot1) {
                        swap(a, k, less++);
                    }
                }
            }
        }
        // subarray
        if (pivot1 < pivot2) {
            dualPivotQuicksort(a, less, great, div);
        }
    }
}

At first consider the classic Quicksort scheme and find the average
number of comparisons and swaps for it. We assume that input data is
random permutation of n numbers from the range [1..n].

Classic Quicksort
=================

1. Choose a pivot element (take random),
2. Compare each (n-1) elements with the pivot
3. and swap it, if necessary, to have the partitions:
   [ <= pivot | >= pivot ]
4. Sort recursively left and right parts.

>From the algorithm above, the average number of comparisons C_n as a
function of the number of elements may be represented by the equation:

(1)  C_n = (n-1) + 1/n * sum_{k=0}^{n-1} {C_k + C_{n-k-1}}

and last sum can be rewritten:

(2)  C_n = (n-1) + 2/n * sum_{k=0}^{n-1} C_k

Write formula (2) for n+1:

(3)  C_{n+1} = n + 2/(n+1) * sum_{k=0}^{n} C_k

Multiply (2) by n and (3) by (n+1) and subtract one from other, we have:

(4)  (n+1)*C_{n+1} - n*C_n = 2*n + 2*C_n

Sorting an array of n elements may be considered as selecting one permutation
of the n elements among all possible permutations. The number of possible
permutations of n elements is n!, so the task for any sorting algorithm is
to determine the one permutation out of n! possibilities. The minimum number
of operations (swap and comparisons) for sorting n elements is const*ln(n!).
>From the Stirling's formula the approximation of the number of operations
is A*n*ln(n) + B*n + C, where A, B and C are constant coefficients.
The coefficients B and C are not important for large n. Therefore,
the function C_n may be approximated by the equation:

(5)  C_n = A*n*ln(n)

The function C_n is substituted from equation (5) into equation (4),
which yields the following equation:

(6)  (n+1)*A*(n+1)*ln(n+1) - n*A*n*ln(n) = 2*n + 2*A*n*ln(n)

Using the properties of logarithms, equation (6) can then be reduced to:

(7)  n*ln(1+1/n) + 2*ln(1+1/n) + (1/n) * ln(n+1) = 2/A

Using a property of logarithm: ln(1 + x) -> x, if x -> 0, and other property:
ln(n) / n -> 0, when n -> +oo, equation (7) will be approximated by:

(8)  1 + 0 + 0 = 2/A

So, the coefficient A is equal to 2 and the average number of comparisons
in sorting of n size arrays is

(9) C_n = 2*n*ln(n).

To find the approximation of the average number of swaps, we use the similar
approach as in the case of comparisons. The average number of swaps S_n as a
function of the number of elements may be represented by the equation:

(10)  S_n = 1/2*(n-1) + 2/n * sum_{k=0}^{n-1} S_k

We assume that average number of swaps during one iteration is 1/2*(n-1).
It means that in average one half of elements is swapped only. Using the
same approach, we find that the coefficient A equals to 1. Therefore,
the function S_n may be approximated by the equation:

(11)  S_n = n*ln(n)

-------------------------------------------------------------------------------

Now consider the Dual-Pivot Quicksort scheme and find the average
number of comparisons and swaps for it. We assume that input data is
random permutation of n numbers from the range [1..n].

Dual-Pivot Quicksort
====================

1. Choose 2 pivot elements pivot1 and pivot2 (take random),
2. pivot1 must be less or equal than pivot2, otherwise they are swapped
3. Compare each (n-2) elements with the pivots
4. and swap it, if necessary, to have the partitions:
   [ <= p1 | p1 <= & <= p2 | >= p2 ]
5. Sort recursively left, center and right parts.

>From the algorithm above, the average number of comparisons C_n as a
function of the number of elements may be represented by the equation:

(1)  C_n = 1 + 2/(n*(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {C_i + 1*i + 
C_{j-i-1} + 2*(j-i-1) + C_{n-j-1} + 2*(n-j-1)}

Equation (1) means that total number is the sum of the comparison numbers of
all cases of partitions into 3 parts plus number of comparisons for elements
from left part (one comparison), center and right parts (2 comparisons).
It can be rewritten in other way:

(2)  C_n = 1 + R*(n-2) + 2/(n(n-1)) * sum_{i=0}^{n-2} sum_{j=i+1}^{n-1} {C_i + 
C_{j-i-1} + C_{n-j-1}}

where R is the average number of comparisons during one iteration. This
constant R can be found as average value (1 + 2 + 2) / 3 = 5/3, which
means that elements from left part are required only one comparison,
and elements form center and right parts - two comparisons.

The double sum in equation (2) can be reduced:

(3)  C_n = 1 + 5/3*(n-2) + 2/(n*(n-1)) * sum_{k=0}^{n-2} {3 * (n-k-1) * C_k}

Denote 1 + 5/3*(n-2) by f_n and multiply to n*(n-1):

(4)  n*(n-1)*C_n = n*(n-1)*f_n + 6 * sum_{k=0}^{n-2} {(n-k-1) * C_k}

Write formula (4) for n+1:

(5)  n*(n+1)*C_{n+1} = n*(n+1)*f_n + 6 * sum_{k=0}^{n-1} {(n-k) * C_k}

Subtract (4) from (5), we have:

(6)  n*(n+1)*C_{n+1} - n*(n-1)*C_n = n*(n+1)*f_n - n*(n-1)*f_n + 6 * 
sum_{k=0}^{n-2} C_k + 6*C_{n-1}

Denote n*(n+1)*C_{n+1} - n*(n-1)*C_n by X_n and n*(n+1)*f_n - n*(n-1)*f_n by 
F_n:

(7)  X_n = F_n + 6 * sum_{k=0}^{n-2} C_k + 6*C_{n-1}

Write formula (7) for n+1:

(8)  X_{n+1} = F_{n+1} + 6 * sum_{k=0}^{n-1} C_k + 6*C_n

Subtract (7) from (8), we have:

(9)  X_{n+1} - X_n = F_{n+1} - F_n + 6*C_n

Resolving of F_{n+1} - F_n gives:

(10)  X_{n+1} - X_n = 2 + 10*n + 6*C_n

The function X_n is substituted into equation (10), which yields
the following equation:

(11)  (n+1)*(n+2)C_{n+2} -2*n(n+1)*C_{n+1} + (n*(n-1) - 6)*C_n = 2 + 10*n

We will find the function C_n approximated by the equation:

(12)  C_n = A*n*ln(n)

The function C_n is substituted from equation (12) into equation (11),
which yields the following equation:

(13)  (n^3+5*n^2+8*n+4)*ln(n+2) - (2*n^3+4*n^2+2*n)*ln(n+1) + 
(n^3-n^2-6*n)*ln(n) = (10*n+2) / A

Using the properties of logarithms, equation (13) can then be reduced to:

(14)  n^3*(ln(n+2)-2*ln(n+1)+ln(n)) + n^2*(5*ln(n+2)-4*ln(n+1)-ln(n)) + 
n*(8*ln(n+2)-2*ln(n+1)-6*ln(n)) + 4*ln(n+2) = (10*n+2) / A

Using a property of logarithm: ln(1 + x) -> x, if x -> 0, and other property:
ln(n) / n -> 0, when n -> +oo, equation (15) will be approximated by:

(15)  -1 + 4 + 2 + 0 + 0 = 10 / A

So, the coefficient A is equal to 2 and the average number of comparisons
in sorting of n size arrays is

(9) C_n = 2*n*ln(n).

To find the approximation of the average number of swaps, we use the similar
approach as in the case of comparisons. The average number of swaps S_n as a
function of the number of elements may be represented by the equation:

(10)  S_n = 4 + 2/3*(n-2) + 2/(n*(n-1)) * sum_{k=0}^{n-2} {(n-k-1)*S_k}

We assume that average number of swaps during one iteration is 2/3*(n-2).
It means that in average one third of elements is swapped only. Using the
same approach, we find that the coefficient A equals to 0.8. Therefore,
the function S_n may be approximated by the equation:

(11)  S_n = 0.8*n*ln(n)

-------------------------------------------------------------------------

And as summary:

The value of the coefficient A:

            dual-pivot   classic
comparison:    2.0         2.0
      swap:    0.8         1.0 

Reply via email to