Marco, these are some fair points, and I guess George (who initially authored this module iirc) will soon shed some light
Cheers, Gilles On Fri, Jan 31, 2025 at 9:08 PM Marco Vogel <marco...@hotmail.de> wrote: > Gilles, > > Thank you for your response. I understand that distro-provided OpenMPI > binaries are typically built for broad compatibility, often targeting only > baseline instruction sets. > > For x86, this makes sense—if OpenMPI is compiled with a target instruction > set like `x86-64-v2` (no AVX), the `configure.m4` script for the AVX > component first attempts to compile AVX code directly. If that fails, it > retries with the necessary vectorization flags (e.g., `-mavx512f`, etc.). > If successful, these flags are applied, ensuring that vectorized functions > are included in the AVX component. At runtime, OpenMPI detects CPU > capabilities (via CPUID) and uses the AVX functions when available, even if > vectorization wasn’t explicitly enabled by the package maintainers - > assuming I correctly understood the compilation process of the OP > components. > > What I find unclear is why the AArch64 component follows a different > approach. During configuration, it only checks whether the compiler can > compile NEON or SVE without additional flags. If not, the corresponding > intrinsic functions are omitted entirely. This means that if the distro > compilation settings don’t allow NEON or SVE, OpenMPI won’t include the > optimized functions, and processors with these vector units won’t benefit. > Conversely, if NEON or SVE is allowed, the base OPs will likely be > auto-vectorized, reducing the performance gap between the base and > intrinsic implementations. > > Is there a specific reason for this difference in handling SIMD support > between x86 and AArch64 in OpenMPI or am I wrong about the configuration > process? > > Cheers, > > Marco > On 31.01.25 11:47, Gilles Gouaillardet wrote: > > Marco, > > The compiler may vectorize if generating code optimised for a given > platform. > A distro provided Open MPI is likely to be optimised only for "common" > architectures (e.g. no AVX512 on x86 - SSE only? - and no SVE on aarch64) > > Cheers, > > Gilles > > On Fri, Jan 31, 2025, 18:06 Marco Vogel <marco...@hotmail.de> wrote: > >> Hello, >> >> I implemented a new OP component for OpenMPI targeting the RISC-V vector >> extension, following existing implementations for x86 (AVX) and ARM >> (NEON). During testing, I aimed to reproduce results from a paper >> discussing the AVX512 OP component, which stated that OpenMPI’s default >> compiler did not generate auto-vectorized code >> (https://icl.utk.edu/files/publications/2020/icl-utk-1416-2020.pdf >> Chapter 5 Experimental evaluation). However, on my Zen4 machine, I >> observed no performance difference between the AVX OP component and the >> base implementation (with --mca op ^avx) when running `MPI_Reduce_local` >> on a 1MB array. >> To investigate, I rebuilt OpenMPI with CFLAGS='-O3 -fno-tree-vectorize', >> which then confirmed the paper’s findings. This behavior is consistent >> across x86 (AVX), ARM (NEON) and RISC-V (RVV). My question: Did I >> overlook something in my testing or setup? Why wouldn’t the compiler in >> the paper auto-vectorize the base operations when mine allegedly does >> unless explicitly disabled? >> >> Thank you! >> >> Marco >> >> To unsubscribe from this group and stop receiving emails from it, send an >> email to devel+unsubscr...@lists.open-mpi.org. >> >> To unsubscribe from this group and stop receiving emails from it, send an > email to devel+unsubscr...@lists.open-mpi.org. > > To unsubscribe from this group and stop receiving emails from it, send an > email to devel+unsubscr...@lists.open-mpi.org. > To unsubscribe from this group and stop receiving emails from it, send an email to devel+unsubscr...@lists.open-mpi.org.