After some architecture testing, it looks like we can drop the march
options and just enable the specific instruction sets via:

libswrAVX: -mavx
libswrAVX2: -mavx2 -mfma -mbmi2 -mf16c

The benefit of course is that with these flags swr works on AMD cpu's with
AVX and AVX2 instructions enabled.  The specific march flags can cause
instruction set compatibility mismatch since AMD CPUs that supported AVX
didn't necessarily support all of sandybridge instructions and similar for
AVX2 and haswell.  Regarding the extra options for fma, bmi2, and f16c, all
AMD CPUs that supported AVX2 also supported those as well.

- Chuck

On Mon, Jun 13, 2016 at 1:22 PM, Emil Velikov <emil.l.veli...@gmail.com>
wrote:

> On 13 June 2016 at 15:43, Rowley, Timothy O <timothy.o.row...@intel.com>
> wrote:
> >
> >> On Jun 12, 2016, at 3:56 AM, Steven Newbury <st...@snewbury.org.uk>
> wrote:
> >>
> >> On Fri, 2016-06-10 at 16:05 -0400, Ilia Mirkin wrote:
> >>> On Fri, Jun 10, 2016 at 3:43 PM, Tim Rowley <timothy.o.rowley@intel.c
> >>> om> wrote:
> >>>>
> >>>> Previously used core-avx-i was for ivybridge;
> >>>> corei7-avx allows sandybridge.
> >>>> ---
> >>>>  src/gallium/drivers/swr/Makefile.am | 2 +-
> >>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/src/gallium/drivers/swr/Makefile.am
> >>>> b/src/gallium/drivers/swr/Makefile.am
> >>>> index d211f2e..8156cf2 100644
> >>>> --- a/src/gallium/drivers/swr/Makefile.am
> >>>> +++ b/src/gallium/drivers/swr/Makefile.am
> >>>> @@ -124,7 +124,7 @@ COMMON_LDFLAGS = \
> >>>>  lib_LTLIBRARIES = libswrAVX.la libswrAVX2.la
> >>>>
> >>>>  libswrAVX_la_CXXFLAGS = \
> >>>> -       -march=core-avx-i \
> >>>> +       -march=corei7-avx \
> >>> Just wondering if it'd be enough to say like
> >>>
> >>> -march=x86_64
> >>> -mavx
> >>>
> >>> and add -mavx2 for libswrAVX2.
> >>>
> >>> I suspect you've iterated through this 20 times and this has some
> >>> shortcoming I'm not thinking of, but figured I'd point it out just in
> >>> case it helps.
> >>>
> >>>   -ilia
> >>>
> >> Maybe I'm the only one who finds it horrible to override -march from
> >> within project build systems.
>
> >> It causes no end of problems with LTO,
> >> and results in objects being built inappropriately for the target as
> >> specified by the builder.
> Can you please elaborate on the exact problem ? Is there link where
> one can read more on the topic ?
>
> > I agree that the way swr is built can be improved and welcome
> ideas/discussion for work on the master branch, but for the moment I’m
> trying to get the 12.0 branch in good shape for swr with non-invasive
> changes (as we’ve tested the current setup pretty extensively).
> >
> Fully agree. Getting something more robust and less evasive for 12.0
> and work for 'the best' in master. Perhaps attribute((target)) perhaps
> folding the multiple binaries into one, or maybe something else ?
>
> >> Why not use function attributes?  Either compile a function
> >> specifically for a particular target, or specifically enable AVX though
> >> "__attribute__ ((__target__ ("avx")))"
> Is this going to produce correct binary if one provides -mno-avx
> and/or alike ? I believe for some CPUs one should (or the compiler
> does it for them) explicitly disable avx but honestly I've never
> looked if cpuid for the said CPUs shows avx as supported.
>
> In the latter case (cpuid listing AVX as supported, while one is
> recommended building with -mno-avx) how do you suggest we should
> handle things ?
>
> >> (ideally with a fall-back
> >> implementation detected at run-time when the instruction set isn't
> >> supported).  Even better, have it also protected at build-time by a
> >> simple check for __AVX__ so the code can be compiled out entirely where
> >> the package builder has specified -march(=native) as a build flag.
> The idea here is to have the binary build explicitly with avx/avx2 (if
> compiler is capable), even when doing generic builds. The picky part
> comes when one has to establish if the user provided one is 'more
> generic' or not (as per your example).
>
> Then again, based on my limited 'testing', building mesa with native
> and LTO brings us little to no improvements. 'Tested' running glxgears
> in small window.
>
> > One problem with the function attribute path is that as far as I know,
> MSVC does not have a similar feature.
> >
> Afaict MSVC has optimise pragma [1], just like GCC. Ideally they'll
> add an equivalent to attribute((target)) soon ?
>
> Even without that, swr (as seen in mesa) does not build MSVC. When
> that cases it should be a matter of factoring the attribute target as
> a macro and leaving it empty for the !GCC build ?
>
> TL;DR: IMHO fix is ok for now, but for the future we might want to
> explode other options.
>
> -Emil
> [1] https://msdn.microsoft.com/en-us/library/chh3fb0k.aspx
> _______________________________________________
> mesa-stable mailing list
> mesa-sta...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-stable
>
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to