I note that I've mostly noped out of this discussion because
https://mstdn.jp/@landley/115504860540842713 and
https://mastodon.sdf.org/@washbear/115646255465589454 but as long as I'm
catching up on back email anyway...
On 11/12/25 12:27, Adrian Bunk wrote:
We are already providing a non-PIE version of the Python interpreter for
users who need it for performance reasons, and it is for example
possible that the benefits of providing packages without hardening (for
situations where hardening is not necessary) might bring larger benefits
than architecture-optimized versions.
Long ago when I was doing https://landley.net/aboriginal/about.html
(work which eventually allowed Alpine to be based on busybox), I benched
that statically linking busybox let the autoconf stage of package builds
complete about 20% faster under qemu.
(My theory was lazy binding patched out the PLT indirection on the first
call which dirtied the executable page and forced QEMU to discard the
native code cache and retranslate it, often multiple times as multiple
indirections were dynamically patched. I later found it hilarious that
the dynamic linking people went on to do snap and flatpak and so on,
using FAR more space for no obvious gain...)
Does that mean static linking is faster everywhere? Dunno, I haven't
tried "everywhere". You can't "optimize" without saying what you're
optimizing FOR, and the ground changes out from under you.
Loop unrolling was an optimization, then became a pessimization when cpu
caches showed up, then an optimization again when L2 caches showed up,
and the pendulum went back and forth multiple times before I stopped
trying to even track it sometime around when branch prediction turned
into a security hole and people started doing TLB invalidation
mitigations for it. My takeaway lesson was outside of tight inner loops,
do the simple thing and let the hardware and optimizers take care of
themselves.
I do know I left the Red Hat world for the Debian world when the new
Fedora CD wouldn't install on the Pentium Pro I had at the time (because
they'd "moved on" to an architecture newer than the hardware I was still
using).
I had to learn what x86-64-v1 vs v2 were when an android NDK update made
all binaries it produced segfault on my netbook. I cared because I was
maintaining their command line utilities, and it was nice to be able to
actually test that environment. But I didn't discard my hardware to
humor the change, I just ran my test binaries under qemu until that
netbook died...
There was talk back then (what, 2018?) about teaching repositories to
know about various architecture flags so it could pull optimized
packages for your machine, but the discussion petered out because the
gains were small and the overhead was huge.
> Would x32 optimized for v3 be the best option for many use cases?
It would prevent the x86-64-v2 laptop I'm typing this on from running
those binaries, but I've already talked to the netbsd guys and to them
running on systems people want to use their stuff on is a point of
pride. Like it used to be on Linux, before everybody got old and tired
and needed to lighten the load.
Decisions have costs. It's your call to cull your herd and chastise the
outliers, but it usually means some subset will move on to things that
are still fun.
It's an interesting move giving ultimatums to people who never got
forced onto windows and never moved to GPLv3. Not "I am stepping down
from this and going this way instead", but "xfree86 is now under this
new license, you will all comply hey where are you going"...
*shrug* You do you.
Rob