Quoting D Scott Phillips (2018-09-13 08:28:29) > Tapani Pälli <[email protected]> writes: > > > From: D Scott Phillips <[email protected]> > > > > The reference for MOVNTDQA says: > > > > For WC memory type, the nontemporal hint may be implemented by > > loading a temporary internal buffer with the equivalent of an > > aligned cache line without filling this data to the cache. > > [...] Subsequent MOVNTDQA reads to unread portions of the WC > > cache line will receive data from the temporary internal > > buffer if data is available. > > > > This hidden cache line sized temporary buffer can improve the > > read performance from wc maps. > > > > v2: Add mfence at start of tiled_to_linear for streaming loads (Chris) > > v3: add Android build support (Tapani) > > > > Reviewed-by: Chris Wilson <[email protected]> > > Reviewed-by: Matt Turner <[email protected]> > > Acked-by: Kenneth Graunke <[email protected]> > > --- > > src/mesa/drivers/dri/i965/Android.mk | 22 +++++++++ > > src/mesa/drivers/dri/i965/Makefile.am | 7 +++ > > src/mesa/drivers/dri/i965/Makefile.sources | 6 ++- > > src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 62 > > ++++++++++++++++++++++++++ > > src/mesa/drivers/dri/i965/meson.build | 18 ++++++-- > > 5 files changed, 110 insertions(+), 5 deletions(-) > > > > .. snip .. > > > diff --git a/src/mesa/drivers/dri/i965/Makefile.am > > b/src/mesa/drivers/dri/i965/Makefile.am > > index 0afa7a2f216..d9e06930d38 100644 > > --- a/src/mesa/drivers/dri/i965/Makefile.am > > +++ b/src/mesa/drivers/dri/i965/Makefile.am > > @@ -92,8 +92,14 @@ libi965_gen11_la_CFLAGS = $(AM_CFLAGS) > > -DGEN_VERSIONx10=110 > > > > noinst_LTLIBRARIES = \ > > libi965_dri.la \ > > + libintel_tiled_memcpy.la \ > > $(I965_PERGEN_LIBS) > > > > +libintel_tiled_memcpy_la_SOURCES = \ > > + $(intel_tiled_memcpy_FILES) > > +libintel_tiled_memcpy_la_CFLAGS = \ > > + $(AM_CFLAGS) $(SSE41_CFLAGS) > > + > > The issue here is that SSE41_CFLAGS includes -msse4.1, which (1) allows > us to use sse4.1 intrinsics and (2) allows the compiler to use sse4.1 > instructions in whatever way it wants. > > 1 is the desired behavior here and 2 is an unfortunate side-effect. The > intrinsics we use are properly guarded by runtime checks so they are > only exercised on systems with support. The other uses of sse4.1 by the > compiler outside the intrinsics is unpredictable. When I made this > change there actually were none, which is why everything worked. But the > compiler has permission to change its mind at any point later. > > The sse4.1 code needs isolated from everything else somehow, either > split into separate files or compile the same file multiple times with > different flags, or something.
The way we use sse4.1 in mesa core (the only place I know of that we use it) is to compile a separate static library using the sse4.1 flags. Dylan
signature.asc
Description: signature
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
