On Fri, 15 Aug 2025 at 01:03, Colin Watson <[email protected]> wrote:
>
> On Fri, Aug 15, 2025 at 12:47:35AM +1200, [email protected] wrote:
> >On Wed, 13 Aug 2025 at 23:06, Colin Watson <[email protected]> wrote:
> >...
> >> I'm downgrading this for the moment as I can't currently find evidence
> >> that it's a baseline violation. I've tried this in various ancient qemu
> >> CPU models ("-cpu Conroe", "-cpu qemu64", "-cpu core2duo"), and it seems
> >> fine there. I'm prepared to believe that I've missed something, but
> >> figuring it out seems like a bit of a fishing expedition.
> >
> >Hi Colin! It's a baseline violation. Your analysis of the build files
> >was helpful but ultimately I just had to check the dmesg log for the
> >segfault and look up the offset in the shared library:
> >
> >traps: mtxrun[62011] trap invalid opcode ip:7fe6e4f64988
> >sp:7ffe42301c80 error:0 in libmimalloc.so.3.0[c988,7fe6e4f5e000+15000]
> >
> >c988: f3 48 0f b8 c2 popcnt rax,rdx
>
> OK. I think this is coming from supposedly CPUID-guarded code:
>
> $ git grep -i popcnt
> include/mimalloc/bits.h:extern bool _mi_cpu_has_popcnt;
> include/mimalloc/bits.h: if mi_unlikely(!_mi_cpu_has_popcnt) { return
> _mi_popcount_generic(x); }
> include/mimalloc/bits.h: __asm ("popcnt\t%1,%0" : "=r"(r) : "r"(x) :
> "cc");
> include/mimalloc/bits.h: if mi_unlikely(!_mi_cpu_has_popcnt) { return
> _mi_popcount_generic(x); }
> include/mimalloc/bits.h: return (size_t)mi_msc_builtinz(__popcnt)(x);
> include/mimalloc/bits.h: return (size_t)mi_msc_builtinz(__popcnt)(x);
> src/init.c:mi_decl_cache_align bool _mi_cpu_has_popcnt = false;
> src/init.c: _mi_cpu_has_popcnt = ((cpu_info[2] & (1 << 23)) != 0); //
> bit 23 of ECX : see
> <https://en.wikipedia.org/wiki/CPUID#EAX=1:_Processor_Info_and_Feature_Bits>
> src/init.c: _mi_cpu_has_popcnt = true;
>
> Here's the relevant code (for GCC/amd64):
>
> #if !defined(__BMI1__)
> if mi_unlikely(!_mi_cpu_has_popcnt) { return _mi_popcount_generic(x); }
> #endif
> size_t r;
> __asm ("popcnt\t%1,%0" : "=r"(r) : "r"(x) : "cc");
> return r;
>
> And:
>
> static void mi_detect_cpu_features(void) {
> // FSRM for fast short rep movsb/stosb support (AMD Zen3+ (~2020) or
> Intel Ice Lake+ (~2017))
> // EMRS for fast enhanced rep movsb/stosb support
> uint32_t cpu_info[4];
> if (mi_cpuid(cpu_info, 7)) {
> _mi_cpu_has_fsrm = ((cpu_info[3] & (1 << 4)) != 0); // bit 4 of EDX :
> see <https://en.wikipedia.org/wiki/CPUID#EAX=7,_ECX=0:_Extended_Features>
> _mi_cpu_has_erms = ((cpu_info[1] & (1 << 9)) != 0); // bit 9 of EBX :
> see <https://en.wikipedia.org/wiki/CPUID#EAX=7,_ECX=0:_Extended_Features>
> }
> if (mi_cpuid(cpu_info, 1)) {
> _mi_cpu_has_popcnt = ((cpu_info[2] & (1 << 23)) != 0); // bit 23 of
> ECX : see
> <https://en.wikipedia.org/wiki/CPUID#EAX=1:_Processor_Info_and_Feature_Bits>
> }
> }
>
> In principle this sort of thing should be OK. But maybe __BMI1__ is
> defined and so the generic fallback isn't present, or maybe the CPUID
> check is incorrect for your CPU? I'll check the former when I have a
> little more time, but if you could check the latter that would be
> helpful.
This observation may help. The relevant code (for GCC/amd64) should
include a runtime check before executing the popcnt instruction if
__BMI1__ is not defined.
If we look at the surrounding assembly before popcnt is executed there
should be a simple compare for the boolean value _mi_cpu_has_popcnt
and a jump skipping the popcnt instruction if the result is false:
c976: f0 49 0f b1 0c 24 lock cmpxchg QWORD PTR [r12],rcx
c97c: 75 f2 jne c970
<mi_arena_reload@@Base+0x7a0>
c97e: 48 8d 0d bb ce 01 00 lea rcx,[rip+0x1cebb]
# 29840 <mi_stats_get_json@@Base+0x10a70>
c985: 48 21 c2 and rdx,rax
c988: f3 48 0f b8 c2 popcnt rax,rdx
Here's another snippet from the binary (no check is apparent):
ded9: b8 01 00 00 00 mov eax,0x1
dede: 48 d3 e0 shl rax,cl
dee1: 48 83 e8 01 sub rax,0x1
dee5: 89 d1 mov ecx,edx
dee7: 48 d3 e0 shl rax,cl
deea: 49 89 c7 mov r15,rax
deed: 48 8d 05 4c b9 01 00 lea rax,[rip+0x1b94c]
# 29840 <mi_stats_get_json@@Base+0x10a70>
def4: f3 4c 0f b8 df popcnt r11,rdi
Locally there is no indication these simple checks are being performed.
This indicates __BMI1__ is incorrectly defined.
(Or perhaps there is no runtime detection going on and what we are
seeing is output from intrinsics compiled for a platform that supports
the popcnt instruction.)
Regards,
Adam