Tapani Pälli <[email protected]> writes: > From: D Scott Phillips <[email protected]> > > The reference for MOVNTDQA says: > > For WC memory type, the nontemporal hint may be implemented by > loading a temporary internal buffer with the equivalent of an > aligned cache line without filling this data to the cache. > [...] Subsequent MOVNTDQA reads to unread portions of the WC > cache line will receive data from the temporary internal > buffer if data is available. > > This hidden cache line sized temporary buffer can improve the > read performance from wc maps. > > v2: Add mfence at start of tiled_to_linear for streaming loads (Chris) > v3: add Android build support (Tapani) > > Reviewed-by: Chris Wilson <[email protected]> > Reviewed-by: Matt Turner <[email protected]> > Acked-by: Kenneth Graunke <[email protected]> > --- > src/mesa/drivers/dri/i965/Android.mk | 22 +++++++++ > src/mesa/drivers/dri/i965/Makefile.am | 7 +++ > src/mesa/drivers/dri/i965/Makefile.sources | 6 ++- > src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 62 > ++++++++++++++++++++++++++ > src/mesa/drivers/dri/i965/meson.build | 18 ++++++-- > 5 files changed, 110 insertions(+), 5 deletions(-) >
.. snip .. > diff --git a/src/mesa/drivers/dri/i965/Makefile.am > b/src/mesa/drivers/dri/i965/Makefile.am > index 0afa7a2f216..d9e06930d38 100644 > --- a/src/mesa/drivers/dri/i965/Makefile.am > +++ b/src/mesa/drivers/dri/i965/Makefile.am > @@ -92,8 +92,14 @@ libi965_gen11_la_CFLAGS = $(AM_CFLAGS) -DGEN_VERSIONx10=110 > > noinst_LTLIBRARIES = \ > libi965_dri.la \ > + libintel_tiled_memcpy.la \ > $(I965_PERGEN_LIBS) > > +libintel_tiled_memcpy_la_SOURCES = \ > + $(intel_tiled_memcpy_FILES) > +libintel_tiled_memcpy_la_CFLAGS = \ > + $(AM_CFLAGS) $(SSE41_CFLAGS) > + The issue here is that SSE41_CFLAGS includes -msse4.1, which (1) allows us to use sse4.1 intrinsics and (2) allows the compiler to use sse4.1 instructions in whatever way it wants. 1 is the desired behavior here and 2 is an unfortunate side-effect. The intrinsics we use are properly guarded by runtime checks so they are only exercised on systems with support. The other uses of sse4.1 by the compiler outside the intrinsics is unpredictable. When I made this change there actually were none, which is why everything worked. But the compiler has permission to change its mind at any point later. The sse4.1 code needs isolated from everything else somehow, either split into separate files or compile the same file multiple times with different flags, or something. > libi965_dri_la_SOURCES = \ > $(i965_FILES) \ > $(i965_oa_GENERATED_FILES) > @@ -104,6 +110,7 @@ libi965_dri_la_LIBADD = \ > $(top_builddir)/src/intel/compiler/libintel_compiler.la \ > $(top_builddir)/src/intel/blorp/libblorp.la \ > $(I965_PERGEN_LIBS) \ > + libintel_tiled_memcpy.la > $(LIBDRM_LIBS) > > BUILT_SOURCES = $(i965_oa_GENERATED_FILES) _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
