Addressed review comments by Marek. As part of that the max number of patches per threadgroup was reduced to 40 from 64. This reduced unigine-heaven performance from 43.1 fps to 42.5 fps (the number varies a little but the magnitude of the difference is pretty constant)
However it is likely that the optimal value for it differs between applications, and I don't have that many applications to check against. Any thoughts on the issue? - Bas Bas Nieuwenhuizen (14): radeonsi: Add buffer for offchip storage between TCS and TES. radeonsi: Add offchip tessellation parameters. radeonsi: Define build_tbuffer_store_dwords earlier to support new users. radeonsi: Add buffer load functions. radeonsi: Use correct parameter index for LS_OUT_LAYOUT. radeonsi: Add user SGPR for the layout of the offchip buffer. radeonsi: Add offchip buffer address calculation. radeonsi: Store inputs to memory when not using a TCS. radeonsi: Use buffer loads and stores for passing data from TCS to TES. radeonsi: Remove LDS layout user SGPR's from TES. radeonsi: Enable dynamic HS. radeonsi: Add barrier before writing the tess factors. radeonsi: Process multiple patches per threadgroup. radeonsi: Allow TES distribution between shader engines. src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.h | 1 + src/gallium/drivers/radeonsi/si_shader.c | 547 +++++++++++++++++++----- src/gallium/drivers/radeonsi/si_shader.h | 32 +- src/gallium/drivers/radeonsi/si_state.c | 5 + src/gallium/drivers/radeonsi/si_state.h | 3 + src/gallium/drivers/radeonsi/si_state_draw.c | 67 ++- src/gallium/drivers/radeonsi/si_state_shaders.c | 71 ++- src/gallium/drivers/radeonsi/sid.h | 3 + 9 files changed, 589 insertions(+), 141 deletions(-) -- 2.8.3 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev