On Tue, 13 Jan 2015, Niels Möller wrote:
Nikos Mavrogiannopoulos <[email protected]> writes:
It's early, but it would be nice if the arm neon code was part of fat
as well.
Sure, that's the next step, once I have a structure I think is workable.
Does anyone have a pointer to how to check cpu capabilities on ARM?
Yes - and it's a bit hairy. (I've got a TL;DR version halfway down.)
There's no direct CPU instruction for it, contrary to x86. One way of
detecting it via pure code, is trying to execute the tested instructions,
and catching the SIGILL (or similar on other platforms) in case it isn't
supported. (Touching signal handlers from within a library isn't necessary
a nice thing to do, though.)
Short of trying to run the instructions, some OSes provide this info in
another way - Linux is the main case here.
Before going into the Linux case, note that iOS doesn't have such a
mechanism, but it isn't really needed there. On iOS, all armv7
configurations include support for NEON, so if you can assemble NEON
instructions you don't need any detection. Since this platform uses fat
binaries, you could have a separate armv6 slice of your binary (and that's
the main way of doing it here - instead of enabling things at runtime
within one binary, include separate slices for each intended
configurations). The recent Xcode tools no longer support building for
armv6 though, and App Store doesn't accept such submissions any longer.
Similarly for Windows Phone (and WinRT), the tools assume a platform with
armv7 including NEON, so this doesn't require any detection. If you'd want
to use more exotic instructions that aren't available in this baseline,
you'd probably need to have detection via SIGILL/exception handlers.
On Linux, you can open /proc/self/auxv and parse this relatively easily,
and check for HWCAP_NEON. This has got the drawback that recent Android
kernels may block access to this file [1].
Instead of opening this file, you could use the getauxval function to get
the same auxillary vector. Since this function isn't universally
available, you'd also need to check whether you can use it at all (or load
it using dlsym). In particular, it has only been available for a relative
short time on Android, so you can't rely on it there.
The final fallback is parsing /proc/cpuinfo, which always should work. You
can pretty easily find the Features line and look for the features. The
line ends with a space, so you can use something as simple as
strstr(line, " neon ") to parse it.
The gotcha about /proc/cpuinfo is that it is different for ARMv8 kernels -
features like neon, which were optional on ARMv7, aren't optional any
longer and thus are omitted. To handle this, you can either parse the "CPU
architecture" field, and if this is >= 8, assume neon, or you can look for
the "asimd" feature which is printed, which means the same.
To simplify running old 32 bit binaries, the Android ARMv8 kernels have an
extra compatibility feature for this, readding the "neon" keyword there.
[2] [3] This extra compatibility isn't available in upstream kernels
though so it can't be relied on (it was proposed in [4] but not merged
yet).
TL;DR - it's mostly only necessary on linux. The simplest solution which
works everywhere is parsing /proc/cpuinfo.
[1] http://b.android.com/43055
[2] https://android.googlesource.com/kernel/common/+/cba0c6b2913c0d075a7434025f5dc29cd813707f%5E%21/
[3]
https://android.googlesource.com/kernel/common/+/3868e7f8d47992922756d1aa6590f0d556c669b8%5E%21/
[4] http://marc.info/?l=linux-arm-kernel&m=139087240101974
Example of /proc/cpuinfo from a pandaboard:
Processor : ARMv7 Processor rev 10 (v7l)
processor : 0
BogoMIPS : 1392.74
processor : 1
BogoMIPS : 1363.33
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09
CPU revision : 10
Hardware : OMAP4 Panda board
Revision : 0020
Serial : 0000000000000000
From a Nexus 9:
Processor : NVIDIA Denver 1.0 rev 0 (aarch64)
processor : 0
processor : 1
Features : fp asimd aes pmull sha1 sha2 crc32
CPU implementer : 0x4e
CPU architecture: AArch64
CPU variant : 0x0
CPU part : 0x000
CPU revision : 0
Hardware : Flounder
Revision : 0000
Serial : 0000000000000000
MTS version : 33410787
From a Nexus 9, read from a 32 bit process:
Processor : NVIDIA Denver 1.0 rev 0 (aarch64)
processor : 0
processor : 1
Features : fp asimd aes pmull sha1 sha2 crc32 wp half thumb fastmult vfp
edsp neon vfpv3 tlsi vfpv4 idiva idivt
CPU implementer : 0x4e
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x000
CPU revision : 0
Hardware : Flounder
Revision : 0000
Serial : 0000000000000000
MTS version : 33410787
Finally, a few examples on all of this from other libraries:
libvpx, catching illegal instruction exceptions on windows platforms, and
parsing /proc/cpuinfo:
http://git.chromium.org/gitweb/?p=webm/libvpx.git;a=blob;f=vpx_ports/arm_cpudetect.c;h=8a4b8af964
libav, trying /proc/self/auxv, falling back to /proc/cpuinfo:
https://git.libav.org/?p=libav.git;a=blob;f=libavutil/arm/cpu.c;h=8bdaa884
OpenH264, with very minimal parsing of /proc/cpuinfo (and a bunch of other
things):
https://github.com/cisco/openh264/blob/34661f1d8/codec/common/src/cpu.cpp#L250
The Android cpufeatures library (which tries /proc/self/auxv, tries
loading getauxval, and falls back to /proc/cpuinfo):
https://android.googlesource.com/platform/ndk/+/13a99c7f/sources/android/cpufeatures/cpu-features.c
x264, catching SIGILL:
http://git.videolan.org/?p=x264.git;a=blob;f=common/cpu.c;h=cad5f2c2e9
// Martin
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs