Marco,

The compiler may vectorize if generating code optimised for a given
platform.
A distro provided Open MPI is likely to be optimised only for "common"
architectures (e.g. no AVX512 on x86 - SSE only? - and no SVE on aarch64)

Cheers,

Gilles

On Fri, Jan 31, 2025, 18:06 Marco Vogel <marco...@hotmail.de> wrote:

> Hello,
>
> I implemented a new OP component for OpenMPI targeting the RISC-V vector
> extension, following existing implementations for x86 (AVX) and ARM
> (NEON). During testing, I aimed to reproduce results from a paper
> discussing the AVX512 OP component, which stated that OpenMPI’s default
> compiler did not generate auto-vectorized code
> (https://icl.utk.edu/files/publications/2020/icl-utk-1416-2020.pdf
> Chapter 5 Experimental evaluation). However, on my Zen4 machine, I
> observed no performance difference between the AVX OP component and the
> base implementation (with --mca op ^avx) when running `MPI_Reduce_local`
> on a 1MB array.
> To investigate, I rebuilt OpenMPI with CFLAGS='-O3 -fno-tree-vectorize',
> which then confirmed the paper’s findings. This behavior is consistent
> across x86 (AVX), ARM (NEON) and RISC-V (RVV). My question: Did I
> overlook something in my testing or setup? Why wouldn’t the compiler in
> the paper auto-vectorize the base operations when mine allegedly does
> unless explicitly disabled?
>
> Thank you!
>
> Marco
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to devel+unsubscr...@lists.open-mpi.org.
>
>

To unsubscribe from this group and stop receiving emails from it, send an email 
to devel+unsubscr...@lists.open-mpi.org.

Reply via email to