On 03/08/2020 13:08, Iain Buclaw via Digitalmars-d-announce wrote: > > > On 15/05/2020 12:28, Joseph Rushton Wakeling via Digitalmars-d-announce wrote: >> On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote: >>> After reading a paper that grabbed his curiosity and wouldn't let go, >>> Andrei set out to determine if Lomuto partitioning should still be >>> considered inferior to Hoare for quicksort on modern hardware. This blog >>> post details his results. >>> >>> Blog: >>> https://dlang.org/blog/2020/05/14/lomutos-comeback/ >> >> Nice stuff! >> >> One curious question -- unless I've misread things horribly, it looks like >> the D benchmarks for Lomuto branch-free are consistently slower than for >> C++. Any idea why that is? I would expect gcc/gdc and clang/ldc to produce >> effectively identical results for code like this. > > Sorry for the belated response, as far as I can see, gdc and g++ only differ > on one line. > > auto delta = smaller & (read - first); > > This is lowered as: > > delta = smaller & (read - first) / 8; > > However, g++ uses an exact divide operator (semantically that ignores > rounding), whereas gdc uses a truncating divide operator (semantically rounds > the quotient towards zero). > > I'm willing to bet a beer on tweaking pointer subtraction will get gdc in > lockstep with g++. >
I doubt Andrei will re-run the benchmarks now, but here's the PR (problem reference) with patch attached. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96429
