[OMPI devel] Question about auto-vectorization behavior for OpenMPI OP components across architectures

Marco Vogel Fri, 31 Jan 2025 01:07:07 -0800

Hello,

I implemented a new OP component for OpenMPI targeting the RISC-V vectorextension, following existing implementations for x86 (AVX) and ARM(NEON). During testing, I aimed to reproduce results from a paperdiscussing the AVX512 OP component, which stated that OpenMPI’s defaultcompiler did not generate auto-vectorized code(https://icl.utk.edu/files/publications/2020/icl-utk-1416-2020.pdfChapter 5 Experimental evaluation). However, on my Zen4 machine, Iobserved no performance difference between the AVX OP component and thebase implementation (with --mca op ^avx) when running `MPI_Reduce_local`on a 1MB array.To investigate, I rebuilt OpenMPI with CFLAGS='-O3 -fno-tree-vectorize',which then confirmed the paper’s findings. This behavior is consistentacross x86 (AVX), ARM (NEON) and RISC-V (RVV). My question: Did Ioverlook something in my testing or setup? Why wouldn’t the compiler inthe paper auto-vectorize the base operations when mine allegedly doesunless explicitly disabled?


Thank you!

Marco

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

[OMPI devel] Question about auto-vectorization behavior for OpenMPI OP components across architectures

Reply via email to