On Sat, 20 May 2017 15:27:06 -0600, "Todd C. Miller" wrote: > One optimization implemented in the sample code from "Engineering > a Sort Function" that our qsort lacks is storing the partition value > out of line when convenient. Currently, we swap the partition value > into a[0], but this can significantly degrade performance when the > array is sorted in reverse or near-reverse order. > > Since we don't want to allocate memory to store the value, only do > this when the elements of the array are int or long sized (which > is often the case). This speeds up the qsort regress test a bit, > which is probably due to the tests on reverse sorted input. > > This diff requires my "support swapping int-sized elements" diff > be applied first.
Fixed diff. - todd Index: lib/libc/stdlib/qsort.c =================================================================== --- /usr/src/lib/libc/stdlib/qsort.c Sat May 20 08:08:08 2017 +++ /usr/src/lib/libc/stdlib/qsort.c Mon May 22 13:17:35 2017 @@ -40,15 +40,12 @@ * Qsort routine from Bentley & McIlroy's "Engineering a Sort Function". * * This version differs from Bentley & McIlroy in the following ways: - * 1. The partition value is swapped into a[0] instead of being - * stored out of line. - * - * 2. It uses David Musser's introsort algorithm to fall back to + * 1. It uses David Musser's introsort algorithm to fall back to * heapsort(3) when the recursion depth reaches 2*lg(n + 1). * This avoids quicksort's quadratic behavior for pathological * input without appreciably changing the average run time. * - * 3. Tail recursion is eliminated when sorting the larger of two + * 2. Tail recursion is eliminated when sorting the larger of two * subpartitions to save stack space. */ #define SWAPTYPE_BYTEV 1 @@ -57,6 +54,23 @@ #define SWAPTYPE_INT 4 #define SWAPTYPE_LONG 5 +#define PVINIT(pv, pm) do { \ + switch (swaptype) { \ + case SWAPTYPE_INT: \ + pv = (char *)&v; \ + v.i = *(int *)pm; \ + break; \ + case SWAPTYPE_LONG: \ + pv = (char *)&v; \ + v.l = *(long *)pm; \ + break; \ + default: \ + pv = a; \ + swap(pv, pm); \ + break; \ + } \ +} while(0) + #define TYPE_ALIGNED(TYPE, a, es) \ (((char *)a - (char *)0) % sizeof(TYPE) == 0 && es % sizeof(TYPE) == 0) @@ -122,9 +136,13 @@ introsort(char *a, size_t n, size_t es, size_t maxdepth, int swaptype, int (*cmp)(const void *, const void *)) { - char *pa, *pb, *pc, *pd, *pl, *pm, *pn; + char *pa, *pb, *pc, *pd, *pl, *pm, *pn, *pv; int cmp_result; size_t r, s; + union { + int i; + long l; + } v; loop: if (maxdepth == 0) { if (heapsort(a, n, es, cmp) == 0) @@ -150,18 +168,18 @@ } pm = med3(pl, pm, pn, cmp); } - swap(a, pm); - pa = pb = a + es; + PVINIT(pv, pm); /* pv points to partition value */ + pa = pb = a; pc = pd = a + (n - 1) * es; for (;;) { - while (pb <= pc && (cmp_result = cmp(pb, a)) <= 0) { + while (pb <= pc && (cmp_result = cmp(pb, pv)) <= 0) { if (cmp_result == 0) { swap(pa, pb); pa += es; } pb += es; } - while (pb <= pc && (cmp_result = cmp(pc, a)) >= 0) { + while (pb <= pc && (cmp_result = cmp(pc, pv)) >= 0) { if (cmp_result == 0) { swap(pc, pd); pd -= es;