On 26/11/15 16:14, Samuel Iglesias Gonsálvez wrote: > On 26/11/15 15:44, Francisco Jerez wrote: >> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: >> >>> On 18/11/15 06:54, Jordan Justen wrote: >>>> From: Francisco Jerez <curroje...@riseup.net> >>>> >>>> Improves performance of the arb_shader_image_load_store-atomicity >>>> piglit test by over 25x (which isn't a real benchmark it's just heavy >>>> on atomics -- the improvement in a microbenchmark I wrote a while ago >>>> seemed to be even greater). The drawback is one needs to be >>>> extra-careful not to hang the GPU (in fact the whole system). A DC >>>> partition must have been allocated on L3, the "convert L3 cycle for DC >>>> to UC" bit may not be set, the MOCS L3 cacheability bit must be set >>>> for all surfaces accessed using DC atomics, and the SCRATCH1 and >>>> ROW_CHICKEN3 bits must be kept in sync. >>>> >>>> A fairly recent kernel is required for the command parser to allow >>>> writes to these registers. >>>> --- >>>> src/mesa/drivers/dri/i965/gen7_l3_state.c | 14 ++++++++++++++ >>>> 1 file changed, 14 insertions(+) >>>> >>>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c >>>> b/src/mesa/drivers/dri/i965/gen7_l3_state.c >>>> index 48bca29..c863b7f 100644 >>>> --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c >>>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c >>>> @@ -254,5 +254,19 @@ setup_l3_config(struct brw_context *brw, const struct >>>> brw_l3_config *cfg) >>>> SET_FIELD(cfg->n[L3P_T], GEN7_L3CNTLREG3_T_ALLOC)); >>>> >>>> ADVANCE_BATCH(); >>>> + >>>> + if (brw->is_haswell && brw->intelScreen->cmd_parser_version >= 4) { >>>> + /* Enable L3 atomics on HSW if we have a DC partition, otherwise >>>> keep >>>> + * them disabled to avoid crashing the system hard. >>>> + */ >>>> + BEGIN_BATCH(5); >>>> + OUT_BATCH(MI_LOAD_REGISTER_IMM | (5 - 2)); >>>> + OUT_BATCH(HSW_SCRATCH1); >>>> + OUT_BATCH(has_dc ? 0 : HSW_SCRATCH1_L3_ATOMIC_DISABLE); >>>> + OUT_BATCH(HSW_ROW_CHICKEN3); >>>> + OUT_BATCH(HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE << 16 | >>>> + (has_dc ? 0 : HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE)); >>> >>> >>> I have not found references to ROW_CHICKEN3 nor register with 0xe49c >>> address offset in HSW PRMs, so these could be stupid questions: >>> >> >> Yeah, these chicken registers are not part of the public PRMs. >> >>> Why you need to set the L3 atomic disable flag in two different places >>> in ROW_CHICKEN3 register? Also, why the first flag is set >>> unconditionally while the second one only if we don't have a DC >>> partition? This is what you want? >> >> It's a masked register as many other MMIO registers on Gen hardware: the >> upper 16 bits control which of the lower 16 bits are modified in the >> register. >> >>> >>> >>> Also, if the "HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE << 16" is really >>> needed, it could be defined as a constant in the first patch of the series. >>> >> If you think that would be more readable I guess we could define a >> REG_MASK() macro and use it wherever we write a masked register. See >> attachment (applies on top of this patch). >> >
Forgot to say my R-b to this patch is with the REG_MASK() macro change. > OK, thanks for the explanation. Please add my R-b to this patch and to > the attached one: > > Reviewed-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com> > > Sam > >>> Sam >>> >>>> + ADVANCE_BATCH(); >>>> + } >>>> } >>>> } >>>> >> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev