Re: Question on sorting

Vladimir Iaroslavski Wed, 04 Aug 2010 06:44:25 -0700

Hello Dmytro,

Thank you for investigation, the results are interesting.


I prepared simpler test [1], which has no recursion, it just
sorts 5 candidates and compare all elements with first (or second)
candidate. I run on array of 2000000 elements and result is:

e1/e2 267/460 on random data
e1/e2 193/200 on sorted array

and interesting fact: if I change comparison from "a[i] < pivot"
to "a[i] == pivot", it shows 189 for e1 and e2, for random and
sorted data.

I tried new schema [1/2, 1/4, 1/4], but it shows almost the same
time as [1/3, 1/3, 1/3], no improvements.

Thank you,
Vladimir

[1] ---------------------------------------------------------------
public static void sort(int[] a) {
    int step = a.length >>> 3;
    int e1 =      step;
    int e2 = e1 + step;
    int e3 = e2 + step;
    int e4 = e3 + step;
    int e5 = e4 + step;

    if (a[e1] > a[e2]) { int t = a[e1]; a[e1] = a[e2]; a[e2] = t; }
    if (a[e4] > a[e5]) { int t = a[e4]; a[e4] = a[e5]; a[e5] = t; }
    if (a[e1] > a[e3]) { int t = a[e1]; a[e1] = a[e3]; a[e3] = t; }
    if (a[e2] > a[e3]) { int t = a[e2]; a[e2] = a[e3]; a[e3] = t; }
    if (a[e1] > a[e4]) { int t = a[e1]; a[e1] = a[e4]; a[e4] = t; }
    if (a[e3] > a[e4]) { int t = a[e3]; a[e3] = a[e4]; a[e4] = t; }
    if (a[e2] > a[e5]) { int t = a[e2]; a[e2] = a[e5]; a[e5] = t; }
    if (a[e2] > a[e3]) { int t = a[e2]; a[e2] = a[e3]; a[e3] = t; }
    if (a[e4] > a[e5]) { int t = a[e4]; a[e4] = a[e5]; a[e5] = t; }

    int pivot = a[e1]; // or a[e2]

    for (int i = 0; i < a.length; i++) {
        if (a[i] < pivot);
    }
}
[/1] ---------------------------------------------------------------

Dmytro Sheyko wrote:

Hi Vladimir,

There could be many reasons for this.
The verisimilar ones are imprecise time measurement with highlydispersed results and biased samples.
Another reason is that an attempt to divide whole array into equal-sizepartition not always gives us the best result. And hence choosing
"wrong" pivots could make partitions balanced slightly better.
Let me clarify this counter-intuitive statement regarding not-equalpartitioning.
Consider following quite straightforward dual pivot quicksort.

sort(a[]) {
    pivot1, pivot2 = choosePivots(a);

    // partitioning
    forall (a[k] in a) {
        if (a[k] < pivot1) {
            a[k] goes to the left partition
        } else if (a[k] > pivot2) {
            a[k] goes to the right partition
        } else {
            a[k] goes to the middle partition
        }
    }

    sort(left partition);
    sort(middle partition);
    sort(right partition);
}
Here you can see that during partitioning every item is compared withone or two pivots. In our case item is compared withthe second pivot only if it greater than the first one. So what is theaverage number of comparison during partitioning?If we succeed to choose pivots so that they divide whole array into 3equal partitions we have 1*(1/3) + 2*(1/3) + 2*(1/3) = 5/3 per item.Is this ideal? What if pivots divide array in following proportions 1/21/4 1/4? Then we have 1*(1/2) + 2*(1/4) + 2*(1/4) = 3/2.
3/2 is less than 5/3.
Let's find now ideal proportions of partitions taking into accountnumber of comparison of sorting in whole.
Assume that number of comparison is A*n*ln(n) + B*n + o(n) and we dividewhole array in following proportions
[alpha, (1 - alpha)/2, (1 - alpha)/2], where 0 < alpha < 1.

A*n*ln(n) + B*n =
 = n * (alpha + 2*(1 - alpha)) { partitioning }
  + A * alpha*n * ln(alpha*n) + B * alpha*n { sorting left partition }
+ 2 * A * (1-alpha)*n/2 * ln((1-alpha)*n/2) + 2 * B * (1-alpha)*n/2 {sorting middle and right partitions }
A*n*ln(n) + B*n =
 = A*alpha*n*ln(n) + A*(1-alpha)*n*ln(n) +
 + B*alpha*n + B*(1-alpha)*n +
  + n * (alpha + 2*(1-alpha))
  + A*alpha*n*ln(alpha) + A*(1-alpha)*n*ln((1-alpha)/2)

0 = (alpha + 2*(1-alpha)) + A*alpha*ln(alpha) + A*(1-alpha)*ln((1-alpha)/2)

A = (alpha - 2) / (alpha*ln(alpha) + (1-alpha)*ln((1-alpha)/2))

alpha    A
 1/12    2.078316236
 2/12    1.783079278
 3/12    1.617083005
 4/12    1.517065378
 5/12    1.461274369
 6/12    1.442695041    !!!
 7/12    1.463491681
 8/12    1.536871653
 9/12    1.699242413
10/12    2.060936332
11/12    3.143757518
It appears that the best alpha is about 1/2. Thus ideal partition issomething like [1/2, 1/4, 1/4].
Of course, these consideration does not apply to the real DPQcompletely. This is because in real DPQ every item is not compared withpivots in well defined order and
real DPQ contains numerous special cases, which make it harder to analyze.

Regards,
Dmytro Sheyko

 > From: [email protected]
 > To: [email protected]
 > Subject: Question on sorting
 > Date: Fri, 30 Jul 2010 22:55:00 +0400
 > CC: [email protected]
 >
 > Hello,
 >
 > I have performance question in sorting.
 >
 > In an implementation of Dual-Pivot Quicksort 5 elements
 >
 > a[e1], a[e2], a[e3], a[e4], a[e5]
 >
 > where e1 = len/6, e2 = len/3, e3 = len/2, e4 = len*2/3, e5 = len*5/6,
 > are sorted and then elements a[e2], a[e4] are taken as pivots.
 >
 > but if I take a[e1] and a[e3] as pivots, it works 2-3% faster on
 > *random* inputs.
 >
 > I tried different sorting for these 5 elements: network, bubble,
 > insertion - with a[e1] and a[e3] pivots it is faster than with
 > a[e2] and a[e4].
 >
 > If I sort these 5 elements in descending order, it works faster
 > with a[e5] and a[e3] pivots.
 >
 > Do you have any idea why it happens?
 >
 > Thank you,
 > Vladimir

Re: Question on sorting

Reply via email to