Hi Willy, > Another discussion started around an easier support for some modern > platforms. In issue #1194, Ashley Penney was caught running on AWS's ARM > instances with the default ARM target optimizations. For having run some > tests on these machines, I can't say enough how great they are, but I > also know that in order to unlock their power when dealing with 16 cores > or more, it's mandatory to use their modern locking extensions. Worse, > the default ones do not scale at all and easily maintain deadlocks until > the watchdog gets rid of the situation. I can trivially add a new CPU > option in the makefile, there are already a few preset, it's easy. But > the question is more how this could flow back into distro packages if > possible at all, considering that such code will not run on legacy > platforms (e.g. RPi). The difficulty here is that it's not about > optimization anymore but rather choose from "crashes all the time" and > "works amazingly fast". Another option could be for distros to limit > the number of threads on such platforms to 4 to cover legacy devices > well and prevent the degradation from happening on larger systems. But > this will definitely require that users rebuild themselves to use larger > platforms. In my opinion it is exactly the same problem we've seen a long > time ago with x86 and cmov/mmx/sse/avx in that these are all extensions > used at the lowest compiler level, so I'm sure there's a clean way to > deal with this but I'm not qualified to say how. All ideas welcome!
Correct me if I'm wrong, I do believe this is a perfect example for using new glibc feature hwcaps which allow a given object to be compiled with multiple level of optimization and let the loader select the more appropriate elf tree at runtime. I am no expert about nor have played with it, however the issue you are mentioning here does remind me https://sourceware.org/pipermail/libc-alpha/2020-June/115250.html. Note it does appear only x86_64, powerpc and s390 are supported at this stage. Cheers, -- Bertrand

