On Wed, 20 Sep 2023 16:33:56 GMT, Paul Sandoz <psan...@openjdk.org> wrote:

> > Hi Paul (@PaulSandoz), Alan (@AlanBateman), Any update? Do you agree with 
> > Radix sort in parallel case only?
> 
> I think its definitely a better fit, but another aspect of my previous 
> comment was wondering if we need a radix sort if the vectorized quicksort 
> implementation is fast enough. IMO we need to compare performance results 
> with the vectorized quick sort, and be aware of future enhancements to that.

Hi Paul (@PaulSandoz), Alan (@AlanBateman),

PR with AVX512 improvements has been integrated and I compared current JDK 
sorting
with my suggested changes. We agreeded to invoke Radix sort in parallel sorting 
only,
but sequentiual soring (even without Radix sort) shows upto ~ 5-20% speedup
as shown in the performance data below.

Arrays.sort benchmark | Data Type | Array Size | Baseline (us/op) | New version 
(us/op) | Speedup
-- | -- | -- | -- | -- | --
ArraysSort.intSort |   RANDOM |     600 |      9,016 |      8,426 |  1.07 
ArraysSort.intSort |   RANDOM |    9000 |    526,099 |    499,479 |  1.05 
ArraysSort.intSort |   RANDOM |   20000 |   1283,544 |   1194,464 |  1.07 
ArraysSort.intSort |   RANDOM |  400000 |  32547,730 |  30314,660 |  1.07 
ArraysSort.intSort |   RANDOM | 3000000 | 287803,223 | 274460,275 |  1.05 
ArraysSort.intSort | REPEATED |     600 |      2,041 |      1,877 |  1.09 
ArraysSort.intSort | REPEATED |    9000 |    101,790 |     94,416 |  1.08 
ArraysSort.intSort | REPEATED |   20000 |    262,945 |    210,241 |  1.25 
ArraysSort.intSort | REPEATED |  400000 |   4560,917 |   4242,648 |  1.08 
ArraysSort.intSort | REPEATED | 3000000 |  36925,218 |  33656,517 |  1.10 
ArraysSort.intSort |  STAGGER |     600 |      9,348 |      3,711 |  2.52 
ArraysSort.intSort |  STAGGER |    9000 |     54,833 |     50,308 |  1.09 
ArraysSort.intSort |  STAGGER |   20000 |    124,794 |    110,447 |  1.13 
ArraysSort.intSort |  STAGGER |  400000 |   2623,982 |   2325,099 |  1.13 
ArraysSort.intSort |  STAGGER | 3000000 |  19688,017 |  16812,050 |  1.17 
ArraysSort.intSort |  SHUFFLE |     600 |      5,361 |      4,960 |  1.08 
ArraysSort.intSort |  SHUFFLE |    9000 |    208,574 |    171,812 |  1.21 
ArraysSort.intSort |  SHUFFLE |   20000 |    442,811 |    401,294 |  1.10 
ArraysSort.intSort |  SHUFFLE |  400000 |  10048,756 |   9313,178 |  1.08 
ArraysSort.intSort |  SHUFFLE | 3000000 |  77659,359 |  65154,186 |  1.19 

Parallel soring (with Radix sort) shows upto x2.1-3.6 speedup,
please, see benchamrking data below.

Arrays.sort benchmark | Data Type | Array Size | Baseline (us/op) | New version 
(us/op) | Speedup
-- | -- | -- | -- | -- | --
ArraysSort.intParallelSort |   RANDOM |     600 |     9,036 |     8,350 | 1.08  
ArraysSort.intParallelSort |   RANDOM |    9000 |   366,551 |   126,962 | 2.89  
ArraysSort.intParallelSort |   RANDOM |   20000 |   497,725 |   193,564 | 2.57  
ArraysSort.intParallelSort |   RANDOM |  400000 |  8096,295 |  2711,692 | 2.99  
ArraysSort.intParallelSort |   RANDOM | 3000000 | 68663,019 | 19060,058 | 3.60  
ArraysSort.intParallelSort | REPEATED |     600 |     2,059 |     1,878 | 1.10  
ArraysSort.intParallelSort | REPEATED |    9000 |    80,267 |    70,176 | 1.14  
ArraysSort.intParallelSort | REPEATED |   20000 |   250,379 |   109,370 | 2.29  
ArraysSort.intParallelSort | REPEATED |  400000 |  4571,467 |  1469,696 | 3.11  
ArraysSort.intParallelSort | REPEATED | 3000000 | 30484,679 | 11636,387 | 2.62  
ArraysSort.intParallelSort |  STAGGER |     600 |     9,676 |     3,700 | 2.62  
ArraysSort.intParallelSort |  STAGGER |    9000 |    54,892 |    52,106 | 1.05  
ArraysSort.intParallelSort |  STAGGER |   20000 |   105,592 |    73,180 | 1.44  
ArraysSort.intParallelSort |  STAGGER |  400000 |  1766,893 |   765,052 | 2.31  
ArraysSort.intParallelSort |  STAGGER | 3000000 | 11912,261 |  6759,822 | 1.76  
ArraysSort.intParallelSort |  SHUFFLE |     600 |     5,417 |     4,974 | 1.09  
ArraysSort.intParallelSort |  SHUFFLE |    9000 |   121,579 |    71,555 | 1.70  
ArraysSort.intParallelSort |  SHUFFLE |   20000 |   236,334 |   111,430 | 2.12  
ArraysSort.intParallelSort |  SHUFFLE |  400000 |  3108,046 |  1075,329 | 2.89  
ArraysSort.intParallelSort |  SHUFFLE | 3000000 | 27130,857 | 11791,105 | 2.30  

New version is good enough: doesn't effect on memory usage for Arrays.osrt()
and shows the boost of performance on parallel sorting.

Paul, Alan,
Are you fine with this PR?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1774130261

Reply via email to