Marco, The compiler may vectorize if generating code optimised for a given platform. A distro provided Open MPI is likely to be optimised only for "common" architectures (e.g. no AVX512 on x86 - SSE only? - and no SVE on aarch64)
Cheers, Gilles On Fri, Jan 31, 2025, 18:06 Marco Vogel <marco...@hotmail.de> wrote: > Hello, > > I implemented a new OP component for OpenMPI targeting the RISC-V vector > extension, following existing implementations for x86 (AVX) and ARM > (NEON). During testing, I aimed to reproduce results from a paper > discussing the AVX512 OP component, which stated that OpenMPI’s default > compiler did not generate auto-vectorized code > (https://icl.utk.edu/files/publications/2020/icl-utk-1416-2020.pdf > Chapter 5 Experimental evaluation). However, on my Zen4 machine, I > observed no performance difference between the AVX OP component and the > base implementation (with --mca op ^avx) when running `MPI_Reduce_local` > on a 1MB array. > To investigate, I rebuilt OpenMPI with CFLAGS='-O3 -fno-tree-vectorize', > which then confirmed the paper’s findings. This behavior is consistent > across x86 (AVX), ARM (NEON) and RISC-V (RVV). My question: Did I > overlook something in my testing or setup? Why wouldn’t the compiler in > the paper auto-vectorize the base operations when mine allegedly does > unless explicitly disabled? > > Thank you! > > Marco > > To unsubscribe from this group and stop receiving emails from it, send an > email to devel+unsubscr...@lists.open-mpi.org. > > To unsubscribe from this group and stop receiving emails from it, send an email to devel+unsubscr...@lists.open-mpi.org.