Marco,

these are some fair points, and I guess George (who initially authored this
module iirc) will soon shed some light

Cheers,

Gilles

On Fri, Jan 31, 2025 at 9:08 PM Marco Vogel <marco...@hotmail.de> wrote:

> Gilles,
>
> Thank you for your response. I understand that distro-provided OpenMPI
> binaries are typically built for broad compatibility, often targeting only
> baseline instruction sets.
>
> For x86, this makes sense—if OpenMPI is compiled with a target instruction
> set like `x86-64-v2` (no AVX), the `configure.m4` script for the AVX
> component first attempts to compile AVX code directly. If that fails, it
> retries with the necessary vectorization flags (e.g., `-mavx512f`, etc.).
> If successful, these flags are applied, ensuring that vectorized functions
> are included in the AVX component. At runtime, OpenMPI detects CPU
> capabilities (via CPUID) and uses the AVX functions when available, even if
> vectorization wasn’t explicitly enabled by the package maintainers -
> assuming I correctly understood the compilation process of the OP
> components.
>
> What I find unclear is why the AArch64 component follows a different
> approach. During configuration, it only checks whether the compiler can
> compile NEON or SVE without additional flags. If not, the corresponding
> intrinsic functions are omitted entirely. This means that if the distro
> compilation settings don’t allow NEON or SVE, OpenMPI won’t include the
> optimized functions, and processors with these vector units won’t benefit.
> Conversely, if NEON or SVE is allowed, the base OPs will likely be
> auto-vectorized, reducing the performance gap between the base and
> intrinsic implementations.
>
> Is there a specific reason for this difference in handling SIMD support
> between x86 and AArch64 in OpenMPI or am I wrong about the configuration
> process?
>
> Cheers,
>
> Marco
> On 31.01.25 11:47, Gilles Gouaillardet wrote:
>
> Marco,
>
> The compiler may vectorize if generating code optimised for a given
> platform.
> A distro provided Open MPI is likely to be optimised only for "common"
> architectures (e.g. no AVX512 on x86 - SSE only? - and no SVE on aarch64)
>
> Cheers,
>
> Gilles
>
> On Fri, Jan 31, 2025, 18:06 Marco Vogel <marco...@hotmail.de> wrote:
>
>> Hello,
>>
>> I implemented a new OP component for OpenMPI targeting the RISC-V vector
>> extension, following existing implementations for x86 (AVX) and ARM
>> (NEON). During testing, I aimed to reproduce results from a paper
>> discussing the AVX512 OP component, which stated that OpenMPI’s default
>> compiler did not generate auto-vectorized code
>> (https://icl.utk.edu/files/publications/2020/icl-utk-1416-2020.pdf
>> Chapter 5 Experimental evaluation). However, on my Zen4 machine, I
>> observed no performance difference between the AVX OP component and the
>> base implementation (with --mca op ^avx) when running `MPI_Reduce_local`
>> on a 1MB array.
>> To investigate, I rebuilt OpenMPI with CFLAGS='-O3 -fno-tree-vectorize',
>> which then confirmed the paper’s findings. This behavior is consistent
>> across x86 (AVX), ARM (NEON) and RISC-V (RVV). My question: Did I
>> overlook something in my testing or setup? Why wouldn’t the compiler in
>> the paper auto-vectorize the base operations when mine allegedly does
>> unless explicitly disabled?
>>
>> Thank you!
>>
>> Marco
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to devel+unsubscr...@lists.open-mpi.org.
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
> email to devel+unsubscr...@lists.open-mpi.org.
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to devel+unsubscr...@lists.open-mpi.org.
>

To unsubscribe from this group and stop receiving emails from it, send an email 
to devel+unsubscr...@lists.open-mpi.org.

Reply via email to