On Fri, Aug 15, 2025 at 12:47:35AM +1200, [email protected] wrote:
On Wed, 13 Aug 2025 at 23:06, Colin Watson <[email protected]> wrote:
...
I'm downgrading this for the moment as I can't currently find evidence
that it's a baseline violation.  I've tried this in various ancient qemu
CPU models ("-cpu Conroe", "-cpu qemu64", "-cpu core2duo"), and it seems
fine there.  I'm prepared to believe that I've missed something, but
figuring it out seems like a bit of a fishing expedition.

Hi Colin! It's a baseline violation. Your analysis of the build files
was helpful but ultimately I just had to check the dmesg log for the
segfault and look up the offset in the shared library:

traps: mtxrun[62011] trap invalid opcode ip:7fe6e4f64988
sp:7ffe42301c80 error:0 in libmimalloc.so.3.0[c988,7fe6e4f5e000+15000]

c988:       f3 48 0f b8 c2          popcnt rax,rdx

OK.  I think this is coming from supposedly CPUID-guarded code:

  $ git grep -i popcnt
  include/mimalloc/bits.h:extern bool _mi_cpu_has_popcnt;
  include/mimalloc/bits.h:    if mi_unlikely(!_mi_cpu_has_popcnt) { return 
_mi_popcount_generic(x); }
  include/mimalloc/bits.h:    __asm ("popcnt\t%1,%0" : "=r"(r) : "r"(x) : "cc");
  include/mimalloc/bits.h:    if mi_unlikely(!_mi_cpu_has_popcnt) { return 
_mi_popcount_generic(x); }
  include/mimalloc/bits.h:    return (size_t)mi_msc_builtinz(__popcnt)(x);
  include/mimalloc/bits.h:    return (size_t)mi_msc_builtinz(__popcnt)(x);
  src/init.c:mi_decl_cache_align bool _mi_cpu_has_popcnt = false;
  src/init.c:    _mi_cpu_has_popcnt = ((cpu_info[2] & (1 << 23)) != 0); // bit 23 of 
ECX : see <https://en.wikipedia.org/wiki/CPUID#EAX=1:_Processor_Info_and_Feature_Bits>
  src/init.c:  _mi_cpu_has_popcnt = true;

Here's the relevant code (for GCC/amd64):

    #if !defined(__BMI1__)
    if mi_unlikely(!_mi_cpu_has_popcnt) { return _mi_popcount_generic(x); }
    #endif
    size_t r;
    __asm ("popcnt\t%1,%0" : "=r"(r) : "r"(x) : "cc");
    return r;

And:

  static void mi_detect_cpu_features(void) {
    // FSRM for fast short rep movsb/stosb support (AMD Zen3+ (~2020) or Intel 
Ice Lake+ (~2017))
    // EMRS for fast enhanced rep movsb/stosb support
    uint32_t cpu_info[4];
    if (mi_cpuid(cpu_info, 7)) {
      _mi_cpu_has_fsrm = ((cpu_info[3] & (1 << 4)) != 0); // bit 4 of EDX : see 
<https://en.wikipedia.org/wiki/CPUID#EAX=7,_ECX=0:_Extended_Features>
      _mi_cpu_has_erms = ((cpu_info[1] & (1 << 9)) != 0); // bit 9 of EBX : see 
<https://en.wikipedia.org/wiki/CPUID#EAX=7,_ECX=0:_Extended_Features>
    }
    if (mi_cpuid(cpu_info, 1)) {
      _mi_cpu_has_popcnt = ((cpu_info[2] & (1 << 23)) != 0); // bit 23 of ECX : see 
<https://en.wikipedia.org/wiki/CPUID#EAX=1:_Processor_Info_and_Feature_Bits>
    }
  }

In principle this sort of thing should be OK. But maybe __BMI1__ is defined and so the generic fallback isn't present, or maybe the CPUID check is incorrect for your CPU? I'll check the former when I have a little more time, but if you could check the latter that would be helpful.

(Note that I don't know this library well. I'm just trying to figure this out since it's been blocking some other things I work on.)

--
Colin Watson (he/him)                              [[email protected]]

Reply via email to