Hi Team, Thank you very much for your feedback and comments! I collect here all hints and tips for improvement of the new algorithm. New version of Quicksort is attached. And now my investigations:
1. [Jon Bentley] > Use the median of 2k+1 elements for pivots. > We can choose a sample of five elements, sort them, and then > partition around the 2nd and 4th sorted elements. Yes, this helps, I can't say that we win a lot of time in general, but it works really faster for input data like this 012012012..., 012301230..., 01234012340.... And more important thing: this approach allows me to eliminate "div" parameter in the implementation. As it was suggested, I take middle 5 elements with indexes: s, 2s, 3s, 4s, 5s (where s is length/6), sort them and take 2s and 4s elements as pivots. Also I don't need to compare pivots because they are sorted. 2. [Jon Bentley] > How about doing a test for pivot1 == pivot2, and then > separately handling that special case? equal elements. Right now, the > inner loop in the "// sorting" section compares every element both to > pivot1 and pivot2. Good catch! It reduces time of sorting equal elements (or almost equal) from 347 to 228 and doesn't change time for other cases. 3. [Jon Bentley] > Vladimir is currently using the first scheme that we tried. > I wonder if it might be faster (or clearer) to try an invariant like > | < | M | ? | M | > | > where M denotes middle elements with the property > Pivot1 <= M <= Pivot2 > Vladimir, do you think that you might be able to get a better invariant? Sure, I tried other invariant like this: | < | M | > | ? | but it is slower than current. 4.1 [Oleg Anashkin] > First thing that came to mind - have you thought about extrapolating this > approach to more pivots? If 2-pivot algorithm is faster than 1-pivot, then > 3-pivot might be even faster, right? Can the number of pivots be chosen as > a function of array size (to mitigate overhead)? and 4.2 [Leonid Geller] > As an observation, why not expand the new algorithm to N-Pivot > where N = round(ln(array length)). > This should lower the average sort cost even lower. Here is the experimental results for several pivots. It's easy to define recurrence relation and write code to count the comparisons and swap: Classic Quicksort: comps - 52254956 swaps - 26277985 Dual-Pivot Quicksort: comps - 52246553 swaps - 20977937 Quicksort with 3 pivots: comps - 54399943 swaps - 24224254 Quicksort with 4 pivots: comps - 57241615 swaps - 28659728 You can see that "3 pivots" is better than Classic Quicksort for swaps only, "4 pivots" is the looser, and "Dual-Pivot Quicksort" is real winner. If we have many pivots we spent a lot of time on sorting them and dividing into more parts doesn't save. In case with 2 pivots, we use only 1 comparison and 1/2 swaps to sort it out. 5. [R?mi Forax] > The algorithm use two constants: 37 and 13, > I think they should be declared as private static final. The constants are: 27 (not 37) and 13, and I made it as private static final int. (but 27 is changed to 17, see below) 6. [R?mi Forax] > In the loop tagged "sorting" and "equals element", a[k] can be > stored in a local variable to avoid to recompute it several time. Adding a local variable doesn't save time, but it was done in the line of other improvement (see below). Thank you for pointing to it! 7. [Vladimir Yaroslavskiy] I very often look at swap function and don't like it. I tried to eliminate it and add these 3 lines into the code as inline, but no effect. Then I looked from other side, we have a lot of code such as: for (int k=0; ...) { if (a[k] < ...) { swap(a, k, less++) } ... } You can see that we retrieve a[k] many times, so I suggest doing it like this without swap function: for (int k=0; ...) { int ak = a[k] if (ak < ...) { a[k] = a[less]; a[less++] = ak; } ... } What we have achieve: 1. No swap function call 2. Two assignments instead of three 3. No local variable temp This approach shows very nice results especially for server mode: new: 100 / old: 126 8. [Jonathan Graehl] > suggest this: > int third = len / div; > third=third>0?third:1; Very elegant, but unfortunately I could not apply this, because the schema of choosing pivot elements has been changed. 9. [Goktug Gokdogan] > 1. Falling back-to insertion sort at < 17 instead of < 27. > 2. (a[great] > pivot2) test is more likely to fail compared to > (k < great) in the while loop, so exchange them (ie. use > while (a[great] > pivot2 && k < great)). I tried 17 as boundary for insertion sort: in some cases it is better, in other no. But in average I don't see that time is increased. I set up 17 in new version. Note that in my first version the value was 17, but later it was changed to 25 -> 27. while (a[great] > pivot2 && k < great)) - we win 3-4% ! Thank you! ------------------------------------------------------------------ I hope that I don't miss something what you suggested. If you find missed items or have other comments/suggestions, please, let me know. And the end I would like provide the execution times for standard input data (all times should be divided by 1000 to get seconds): Client VM Server VM random random dpq: 16221 dpq: 19893 jdk: 20162 jdk: 24199 ascendant ascendant dpq: 5686 dpq: 7952 jdk: 9414 jdk: 11748 descendant descendant dpq: 5921 dpq: 7924 jdk: 9736 jdk: 11777 all equal all equal dpq: 261 dpq: 500 jdk: 734 jdk: 1140 repeated 50 repeated 50 dpq: 3450 dpq: 4239 jdk: 4207 jdk: 5874 organ pipes organ pipes dpq: 9364 dpq: 11451 jdk: 12858 jdk: 16534 random 01 random 01 dpq: 1474 dpq: 1423 jdk: 1747 jdk: 2591 010101 010101 dpq: 1092 dpq: 1052 jdk: 1140 jdk: 2076 012012 012012 dpq: 771 dpq: 1035 jdk: 1220 jdk: 1980 012301 012301 dpq: 1140 dpq: 1393 jdk: 1606 jdk: 2876 012340 012340 dpq: 1464 dpq: 1826 jdk: 1947 jdk: 3366 random 4 random 4 dpq: 1855 dpq: 2423 jdk: 2258 jdk: 3148 random 1000 random 1000 dpq: 8008 dpq: 9213 jdk: 8521 jdk: 10363 You can see that now Dual-Pivots Quicksort wins in all nominations (in previous letter it looses in server mode in few cases). Alan, I'll send you updated Arrays.java little bit later. Thank you, Vladimir
/** * @author Vladimir Yaroslavskiy * @version 2009.09.14 m765.753 */ public class DualPivotQuicksort753 { public static void sort(int[] a) { sort(a, 0, a.length); } public static void sort(int[] a, int fromIndex, int toIndex) { rangeCheck(a.length, fromIndex, toIndex); dualPivotQuicksort(a, fromIndex, toIndex - 1); } private static void rangeCheck(int length, int fromIndex, int toIndex) { if (fromIndex > toIndex) { throw new IllegalArgumentException("fromIndex(" + fromIndex + ") > toIndex(" + toIndex + ")"); } if (fromIndex < 0) { throw new ArrayIndexOutOfBoundsException(fromIndex); } if (toIndex > length) { throw new ArrayIndexOutOfBoundsException(toIndex); } } private static void dualPivotQuicksort(int[] a, int left, int right) { int len = right - left; int x; if (len < TINY_SIZE) { // insertion sort for tiny array for (int i = left + 1; i <= right; i++) { for (int j = i; j > left && a[j] < a[j - 1]; j--) { x = a[j - 1]; a[j - 1] = a[j]; a[j] = x; } } return; } // "medians" int sixth = len / 6; int[] med = new int[5]; med[0] = left + sixth; med[1] = med[0] + sixth; med[2] = med[1] + sixth; med[3] = med[2] + sixth; med[4] = med[3] + sixth; for (int i = 1; i < 5; i++) { for (int j = i; j > 0 && a[med[j]] < a[med[j - 1]]; j--) { x = a[med[j - 1]]; a[med[j - 1]] = a[med[j]]; a[med[j]] = x; } } // pivots int pivot1 = a[med[1]]; int pivot2 = a[med[3]]; boolean pivotsAreDifferent = pivot1 != pivot2; a[med[1]] = a[left]; a[med[3]] = a[right]; // pointers int less = left + 1; int great = right - 1; // sorting if (pivotsAreDifferent) { for (int k = less; k <= great; k++) { x = a[k]; if (x < pivot1) { a[k] = a[less]; a[less++] = x; } else if (x > pivot2) { while (a[great] > pivot2 && k < great) { great--; } a[k] = a[great]; a[great--] = x; x = a[k]; if (x < pivot1) { a[k] = a[less]; a[less++] = x; } } } } else { for (int k = less; k <= great; k++) { x = a[k]; if (x == pivot1) { continue; } if (x < pivot1) { a[k] = a[less]; a[less++] = x; } else { while (a[great] > pivot2 && k < great) { great--; } a[k] = a[great]; a[great--] = x; x = a[k]; if (x < pivot1) { a[k] = a[less]; a[less++] = x; } } } } // swap a[left] = a[less - 1]; a[less - 1] = pivot1; a[right] = a[great + 1]; a[great + 1] = pivot2; // left and right subarrays dualPivotQuicksort(a, left, less - 2); dualPivotQuicksort(a, great + 2, right); // equal elements if (great - less > len - DIST_SIZE && pivotsAreDifferent) { for (int k = less; k <= great; k++) { x = a[k]; if (x == pivot1) { a[k] = a[less]; a[less++] = x; } else if (x == pivot2) { a[k] = a[great]; a[great--] = x; x = a[k]; if (x == pivot1) { a[k] = a[less]; a[less++] = x; } } } } // center subarray if (pivotsAreDifferent) { dualPivotQuicksort(a, less, great); } } private static final int DIST_SIZE = 13; private static final int TINY_SIZE = 17; }