Tapani Pälli <[email protected]> writes:

> From: D Scott Phillips <[email protected]>
>
> The reference for MOVNTDQA says:
>
>     For WC memory type, the nontemporal hint may be implemented by
>     loading a temporary internal buffer with the equivalent of an
>     aligned cache line without filling this data to the cache.
>     [...] Subsequent MOVNTDQA reads to unread portions of the WC
>     cache line will receive data from the temporary internal
>     buffer if data is available.
>
> This hidden cache line sized temporary buffer can improve the
> read performance from wc maps.
>
> v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
> v3: add Android build support (Tapani)
>
> Reviewed-by: Chris Wilson <[email protected]>
> Reviewed-by: Matt Turner <[email protected]>
> Acked-by: Kenneth Graunke <[email protected]>
> ---
>  src/mesa/drivers/dri/i965/Android.mk           | 22 +++++++++
>  src/mesa/drivers/dri/i965/Makefile.am          |  7 +++
>  src/mesa/drivers/dri/i965/Makefile.sources     |  6 ++-
>  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 62 
> ++++++++++++++++++++++++++
>  src/mesa/drivers/dri/i965/meson.build          | 18 ++++++--
>  5 files changed, 110 insertions(+), 5 deletions(-)
>

.. snip ..

> diff --git a/src/mesa/drivers/dri/i965/Makefile.am 
> b/src/mesa/drivers/dri/i965/Makefile.am
> index 0afa7a2f216..d9e06930d38 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.am
> +++ b/src/mesa/drivers/dri/i965/Makefile.am
> @@ -92,8 +92,14 @@ libi965_gen11_la_CFLAGS = $(AM_CFLAGS) -DGEN_VERSIONx10=110
>  
>  noinst_LTLIBRARIES = \
>       libi965_dri.la \
> +     libintel_tiled_memcpy.la \
>       $(I965_PERGEN_LIBS)
>  
> +libintel_tiled_memcpy_la_SOURCES = \
> +     $(intel_tiled_memcpy_FILES)
> +libintel_tiled_memcpy_la_CFLAGS = \
> +     $(AM_CFLAGS) $(SSE41_CFLAGS)
> +

The issue here is that SSE41_CFLAGS includes -msse4.1, which (1) allows
us to use sse4.1 intrinsics and (2) allows the compiler to use sse4.1
instructions in whatever way it wants.

1 is the desired behavior here and 2 is an unfortunate side-effect. The
intrinsics we use are properly guarded by runtime checks so they are
only exercised on systems with support. The other uses of sse4.1 by the
compiler outside the intrinsics is unpredictable. When I made this
change there actually were none, which is why everything worked. But the
compiler has permission to change its mind at any point later.

The sse4.1 code needs isolated from everything else somehow, either
split into separate files or compile the same file multiple times with
different flags, or something.

>  libi965_dri_la_SOURCES = \
>       $(i965_FILES) \
>       $(i965_oa_GENERATED_FILES)
> @@ -104,6 +110,7 @@ libi965_dri_la_LIBADD = \
>       $(top_builddir)/src/intel/compiler/libintel_compiler.la \
>       $(top_builddir)/src/intel/blorp/libblorp.la \
>       $(I965_PERGEN_LIBS) \
> +     libintel_tiled_memcpy.la
>       $(LIBDRM_LIBS)
>  
>  BUILT_SOURCES = $(i965_oa_GENERATED_FILES)
_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to