On 26/07/2025 19.32, Nilesh Patra wrote:
Hi Michael,

Hey Nilesh!


I recently came across Christian's blog[1] (thanks for this!) where they
use hwcaps[2] to select the appropriate cpu capabilities for ggml package[3].

This is similar to what we do when we add simde patches to a package and
build for all ISAs starting from baseline SSE2 until AVX2 (or even AVX512), and
we end up writing a home-grown script like this [4] to select appropriate cpu 
capabilities.

Do you think we can get rid of these scripts going forward and instead use 
hwcaps?

For amd64, I agree that targeting the x86_64-v{1,2,3,4} micro-architectures is 
a better idea than all the SSE*/AVX* variants that others and I have been 
building in the past.

Using hwcaps for dynamically-linked scientific computing libraries is a great 
idea, yes! I recommend improving the documentation at 
https://wiki.debian.org/InstructionSelection#hwcaps with concrete 
Debian-specific examples (or perhaps linking to a new wiki page if that gets 
too long).

**Note**: For applications with functions that benefit from the more advanced 
CPU capabilities, hwcaps will only work if those functions are compiled to a 
separate dynamically loaded library (which might be part of the main Debian 
package for that application, or a shared library package).

Unfortunately, I think that many of the packages from the scientific Debian Blends teams 
don't put their performance critical functions in a dynamically loaded library, and thus 
would NOT benefit from the GLIBC 2.33+ hwcaps feature. Using your example of the 
"scrappie" Debian package, we see that there are only binaries, and no dynamic 
libraries: https://packages.debian.org/sid/amd64/scrappie/filelist 
https://packages.debian.org/unstable/scrappie

I would love to see a generic Debian dispatcher script that could be used for amd64 
systems (and eventually arm64 & riscv64 systems) to select between binaries 
using a similar naming scheme to GLIBC hwcaps, but anchored in /usr/bin/ 
(/usr/bin/x86_64-v[1234]/* ?)
For binary selection, we could add a script to 
https://tracker.debian.org/pkg/subarch-select which would be symlinked from 
/usr/bin/app-name and would use subarch-select to choose between 
/usr/bin/x64_64-v[1234]/app-name based upon the current CPU's capabilities.
Likewise I would love to see shared helpers for d/rules for building both shared library packages and single-binary packages which automate the multiple builds and multiple installation locations needed, thus simplifying the work required to take full advantage of GLIBC hwcaps and/or the debian-wide shared dispatcher script mentioned above. (Some packages might have critical code in both an application binary and shared libraries, thus benefiting from using both of the multi-build approaches outlined above).

For RISCV64, I would suggest that the RISC-V Application Profiles 
(RVA{20,22,23}) would be used in the same way that the x86_64-v{1,2,3,4} 
micro-architectures are used on amd64; but this is not yet supported by GLIBC. 
However Debian could support them in the same way that I suggest above for 
amd64 in /usr/bin/x86_64-v[234]/*, perhaps using 
/usr/bin/riscv64-RVA{20,22,23}/*.

For arm64, I think this would require a bit more research. I'm not sure that subsequent 
ARMv{8,9} revisions are strictly followed as I've noticed that ARM suggests checking for 
specific CPU features and not for architecture revisions like "ARMv8.6".

Thank you for your nice email on a favorite subject of mine :-)


[1] 
https://www.kvr.at/posts/easy-dynamic-dispatch-using-GLIBC-hardware-capabilities/
[2] https://manpages.debian.org/unstable/manpages/ld.so.8.en.html#x86~2
[3] 
https://salsa.debian.org/deeplearning-team/ggml/-/commit/5768f4319d8b547fffb027c78e6dea4453a1e3c9
[4] 
https://salsa.debian.org/med-team/scrappie/-/blob/master/debian/bin/simd-dispatch?ref_type=heads

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to