On Wed, 20 Sep 2023 16:33:56 GMT, Paul Sandoz <psan...@openjdk.org> wrote:
> > Hi Paul (@PaulSandoz), Alan (@AlanBateman), Any update? Do you agree with > > Radix sort in parallel case only? > > I think its definitely a better fit, but another aspect of my previous > comment was wondering if we need a radix sort if the vectorized quicksort > implementation is fast enough. IMO we need to compare performance results > with the vectorized quick sort, and be aware of future enhancements to that. Hi Paul (@PaulSandoz), Alan (@AlanBateman), PR with AVX512 improvements has been integrated and I compared current JDK sorting with my suggested changes. We agreeded to invoke Radix sort in parallel sorting only, but sequentiual soring (even without Radix sort) shows upto ~ 5-20% speedup as shown in the performance data below. Arrays.sort benchmark | Data Type | Array Size | Baseline (us/op) | New version (us/op) | Speedup -- | -- | -- | -- | -- | -- ArraysSort.intSort | RANDOM | 600 | 9,016 | 8,426 | 1.07 ArraysSort.intSort | RANDOM | 9000 | 526,099 | 499,479 | 1.05 ArraysSort.intSort | RANDOM | 20000 | 1283,544 | 1194,464 | 1.07 ArraysSort.intSort | RANDOM | 400000 | 32547,730 | 30314,660 | 1.07 ArraysSort.intSort | RANDOM | 3000000 | 287803,223 | 274460,275 | 1.05 ArraysSort.intSort | REPEATED | 600 | 2,041 | 1,877 | 1.09 ArraysSort.intSort | REPEATED | 9000 | 101,790 | 94,416 | 1.08 ArraysSort.intSort | REPEATED | 20000 | 262,945 | 210,241 | 1.25 ArraysSort.intSort | REPEATED | 400000 | 4560,917 | 4242,648 | 1.08 ArraysSort.intSort | REPEATED | 3000000 | 36925,218 | 33656,517 | 1.10 ArraysSort.intSort | STAGGER | 600 | 9,348 | 3,711 | 2.52 ArraysSort.intSort | STAGGER | 9000 | 54,833 | 50,308 | 1.09 ArraysSort.intSort | STAGGER | 20000 | 124,794 | 110,447 | 1.13 ArraysSort.intSort | STAGGER | 400000 | 2623,982 | 2325,099 | 1.13 ArraysSort.intSort | STAGGER | 3000000 | 19688,017 | 16812,050 | 1.17 ArraysSort.intSort | SHUFFLE | 600 | 5,361 | 4,960 | 1.08 ArraysSort.intSort | SHUFFLE | 9000 | 208,574 | 171,812 | 1.21 ArraysSort.intSort | SHUFFLE | 20000 | 442,811 | 401,294 | 1.10 ArraysSort.intSort | SHUFFLE | 400000 | 10048,756 | 9313,178 | 1.08 ArraysSort.intSort | SHUFFLE | 3000000 | 77659,359 | 65154,186 | 1.19 Parallel soring (with Radix sort) shows upto x2.1-3.6 speedup, please, see benchamrking data below. Arrays.sort benchmark | Data Type | Array Size | Baseline (us/op) | New version (us/op) | Speedup -- | -- | -- | -- | -- | -- ArraysSort.intParallelSort | RANDOM | 600 | 9,036 | 8,350 | 1.08 ArraysSort.intParallelSort | RANDOM | 9000 | 366,551 | 126,962 | 2.89 ArraysSort.intParallelSort | RANDOM | 20000 | 497,725 | 193,564 | 2.57 ArraysSort.intParallelSort | RANDOM | 400000 | 8096,295 | 2711,692 | 2.99 ArraysSort.intParallelSort | RANDOM | 3000000 | 68663,019 | 19060,058 | 3.60 ArraysSort.intParallelSort | REPEATED | 600 | 2,059 | 1,878 | 1.10 ArraysSort.intParallelSort | REPEATED | 9000 | 80,267 | 70,176 | 1.14 ArraysSort.intParallelSort | REPEATED | 20000 | 250,379 | 109,370 | 2.29 ArraysSort.intParallelSort | REPEATED | 400000 | 4571,467 | 1469,696 | 3.11 ArraysSort.intParallelSort | REPEATED | 3000000 | 30484,679 | 11636,387 | 2.62 ArraysSort.intParallelSort | STAGGER | 600 | 9,676 | 3,700 | 2.62 ArraysSort.intParallelSort | STAGGER | 9000 | 54,892 | 52,106 | 1.05 ArraysSort.intParallelSort | STAGGER | 20000 | 105,592 | 73,180 | 1.44 ArraysSort.intParallelSort | STAGGER | 400000 | 1766,893 | 765,052 | 2.31 ArraysSort.intParallelSort | STAGGER | 3000000 | 11912,261 | 6759,822 | 1.76 ArraysSort.intParallelSort | SHUFFLE | 600 | 5,417 | 4,974 | 1.09 ArraysSort.intParallelSort | SHUFFLE | 9000 | 121,579 | 71,555 | 1.70 ArraysSort.intParallelSort | SHUFFLE | 20000 | 236,334 | 111,430 | 2.12 ArraysSort.intParallelSort | SHUFFLE | 400000 | 3108,046 | 1075,329 | 2.89 ArraysSort.intParallelSort | SHUFFLE | 3000000 | 27130,857 | 11791,105 | 2.30 New version is good enough: doesn't effect on memory usage for Arrays.osrt() and shows the boost of performance on parallel sorting. Paul, Alan, Are you fine with this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1774130261