RE: Question on sorting

Dmytro Sheyko Tue, 03 Aug 2010 08:26:24 -0700

Hi Vladimir,

There could be many reasons for this.
The verisimilar ones are imprecise time measurement with highly dispersed 
results and biased samples.


Another reason is that an attempt to divide whole array into equal-size 
partition not always gives us the best result.
And hence choosing "wrong" pivots could make partitions balanced slightly 
better.

Let me clarify this counter-intuitive statement regarding not-equal 
partitioning.

Consider following quite straightforward dual pivot quicksort.

sort(a[]) {
    pivot1, pivot2 = choosePivots(a);

    // partitioning
    forall (a[k] in a) {
        if (a[k] < pivot1) {
            a[k] goes to the left partition
        } else if (a[k] > pivot2) {
            a[k] goes to the right partition
        } else {
            a[k] goes to the middle partition
        }
    }

    sort(left partition);
    sort(middle partition);
    sort(right partition);
}

Here you can see that during partitioning every item is compared with one or 
two pivots. In our case item is compared with
the second pivot only if it greater than the first one. So what is the average 
number of comparison during partitioning?
If we succeed to choose pivots so that they divide whole array into 3 equal 
partitions we have 1*(1/3) + 2*(1/3) + 2*(1/3) = 5/3 per item.
Is this ideal? What if pivots divide array in following proportions 1/2 1/4 
1/4? Then we have 1*(1/2) + 2*(1/4) + 2*(1/4) = 3/2.
3/2 is less than 5/3.

Let's find now ideal proportions of partitions taking into account number of 
comparison of sorting in whole.

Assume that number of comparison is A*n*ln(n) + B*n + o(n) and we divide whole 
array in following proportions
[alpha, (1 - alpha)/2, (1 - alpha)/2], where 0 < alpha < 1.

A*n*ln(n) + B*n =
 = n * (alpha + 2*(1 - alpha)) { partitioning }
  + A * alpha*n * ln(alpha*n) + B * alpha*n { sorting left partition }
  + 2 * A * (1-alpha)*n/2 * ln((1-alpha)*n/2) + 2 * B * (1-alpha)*n/2 { sorting 
middle and right partitions }

A*n*ln(n) + B*n =
 = A*alpha*n*ln(n) + A*(1-alpha)*n*ln(n) + 
 + B*alpha*n + B*(1-alpha)*n +
  + n * (alpha + 2*(1-alpha))
  + A*alpha*n*ln(alpha) + A*(1-alpha)*n*ln((1-alpha)/2)

0 = (alpha + 2*(1-alpha)) + A*alpha*ln(alpha) + A*(1-alpha)*ln((1-alpha)/2)

A = (alpha - 2) / (alpha*ln(alpha) + (1-alpha)*ln((1-alpha)/2))

alpha    A
1/12    2.078316236
2/12    1.783079278
3/12    1.617083005
4/12    1.517065378
5/12    1.461274369
6/12    1.442695041    !!!
7/12    1.463491681
8/12    1.536871653
9/12    1.699242413
10/12    2.060936332
11/12    3.143757518

It appears that the best alpha is about 1/2. Thus ideal partition is something 
like [1/2, 1/4, 1/4].

Of course, these consideration does not apply to the real DPQ completely. This 
is because in real DPQ every item is not compared with pivots in well defined 
order and
real DPQ contains numerous special cases, which make it harder to analyze.

Regards,
Dmytro Sheyko


> From: [email protected]
> To: [email protected]
> Subject: Question on sorting
> Date: Fri, 30 Jul 2010 22:55:00 +0400
> CC: [email protected]
> 
> Hello,
> 
> I have performance question in sorting.
> 
> In an implementation of Dual-Pivot Quicksort 5 elements
> 
> a[e1], a[e2], a[e3], a[e4], a[e5]
> 
> where e1 = len/6, e2 = len/3, e3 = len/2, e4 = len*2/3, e5 = len*5/6,
> are sorted and then elements a[e2], a[e4] are taken as pivots.
> 
> but if I take a[e1] and a[e3] as pivots, it works 2-3% faster on
> *random* inputs.
> 
> I tried different sorting for these 5 elements: network, bubble,
> insertion - with a[e1] and a[e3] pivots it is faster than with
> a[e2] and a[e4].
> 
> If I sort these 5 elements in descending order, it works faster
> with a[e5] and a[e3] pivots.
> 
> Do you have any idea why it happens?
> 
> Thank you,
> Vladimir

RE: Question on sorting

Reply via email to