On 2021-11-08, Theo de Raadt <[email protected]> wrote: > Stuart Henderson <[email protected]> wrote: > >> On 2021-11-08, Otto Moerbeek <[email protected]> wrote: >> > On Sun, Nov 07, 2021 at 08:13:36PM -0600, Luke Small wrote: >> > >> >> https://bugs.llvm.org/show_bug.cgi?id=50026 >> >> >> >> I reported it to the llvm people. it is two slightly different quicksort >> >> algorithms which perform radically differently. The one which you could >> >> assume would take more time, performs MUCH better. >> >> >> >> I made a custom quicksort algorithm which outperforms qsort by A LOT for >> >> sorting an array of around 300 randomly created unsigned characters, which >> >> is what I use it for. >> >> >> >> if you read the report a guy said there's a 10% difference for sorting 3 >> >> million characters on freebsd, but there's about 40% performance >> >> difference >> >> on OpenBSD. maybe it's also how the OpenBSD team modified clang to prevent >> >> rop chain stuff or something? I'm using a westmere based intell server. >> >> >> >> it's the same for clang 11. >> >> >> >> What's the deal? >> > >> > 1. Your test is too small. I increased the test size to get less error >> > in measurement. I also changed the code to either run quicksort or >> > quicksort depending on an argument. >> > >> > 2. I confirmed your observation using time on amd64, arm64 however >> > shows a difference more in line with expectations: >> > >> > [otto@mc:8]$ time ./a.out A >> > quicksort >> > time = 0.335777021 >> > 0m00.35s real 0m00.35s user 0m00.01s system >> > [otto@mc:9]$ time ./a.out B >> > quicksort1 >> > time = 0.345703033 >> > 0m00.36s real 0m00.36s user 0m00.01s system >> > >> > >> > I suspect some non-optimal decision of the optimizer on amd64 for the >> > first quicksort. >> > >> > A next step could be somebody looking at the code produced by the >> > compiler. That is not going to be me. >> >> This is another example of code quite badly affected by fixup-gadgets. >> With an increased test size and building separate objects for each of >> the two tests and with/without CFLAGS=-fno-fixup-gadgets: >> >> $ for i in quicksort*; do echo $i; time ./$i; done >> >> quicksort >> 0m02.33s real 0m02.32s user 0m00.00s system >> quicksort-no-fixup-gadgets >> 0m01.29s real 0m01.27s user 0m00.00s system >> quicksort1 >> 0m01.06s real 0m00.94s user 0m00.02s system >> quicksort1-no-fixup-gadgets >> 0m01.08s real 0m00.87s user 0m00.01s system >> >> > > The word everyone is looking for is "tradeoff", meaning this is intentional.
Yes. But we have seen fixup-gadgets making a suboptimal choice before and it was improved. Maybe there's nothing more that can be done, maybe someone proficient in X86 asm will spot something - much easier to compare the two now we know exactly what's affecting this particular code and that it can be toggled with a compiler flag.

