On Thu, Jan 28, 2021 at 7:32 AM Hongtao Liu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi:
>    GCC11 will be the system GCC 2 years from now, and for the
> processors then, they shouldn't even need to split a 256-bit vector
> into 2 128-bits vectors.
>    .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
> option B is better than Option A.
> Option A:
> -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
>
> Option B:
> Option A + 
> -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
>
>   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.

Given the explicit list for unaligned loads it's a no-brainer to change that
for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
we should try to benchmark the effect on ZNVER1 - Martin, do we still
have a znver1 machine around?

Note that with the settings differing in a way to split stores but not to split
loads, loading a just stored value can cause bad STLF and quite a
performance hit (since znver1 has 128bit data paths that shouldn't
be an issue there but it would have an issue for actually aligned data
on CPUs with 256bit data paths).

Thanks,
Richard.

>   Ok for trunk?
>
>
>
>
> --
> BR,
> Hongtao

Reply via email to