On Sun, Mar 19, 2017 at 10:45:59AM +0200, Pohjolainen, Topi wrote: > On Thu, Mar 16, 2017 at 01:02:00PM -0700, Francisco Jerez wrote: > > Jason Ekstrand <ja...@jlekstrand.net> writes: > > > > > Thanks for tracking this down! > > > > > > Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net> > > > > > > I'm not sure how we missed this when CS was brought up. Oh well. > > > > > > > I don't think we did. The hardware docs are somewhat contradictory > > about whether this is required: > > > > | Render CS Only: SW must always program PIPE_CONTROL with CS Stall and > > | Render Target Cache Flush Enable set prior to programming > > | PIPELINE_SELECT command for GPGPU workloads i.e when pipeline mode is > > | set to GPGPU. This is required to achieve better GPGPU preemption > > | latencies for certain programming sequences. If programming > > | PIPE_CONTROL has performance implications then preemption latencies > > | can be trade off against performance by not implementing this > > | programming note. > > > > This sounds like more of a suggestion (which potentially trades off > > throughput for GPGPU latency due to the synchronous execution of the > > compute shader) than a requirement, so the stall may not be required and > > is likely to impact performance. > > > > I wonder if the following workaround would be as effective fixing the > > bug linked below: > > > > - Speculatively flush the render and depth caches with a pipelined > > PIPE_CONTROL right before PIPELINE_SELECT (note that this still > > allows some parallelism between 3D and GPGPU workloads). > > > > - On brw_render_cache_set_check_flush(), if the GPGPU pipeline is > > selected, emit an HDC flush+CS stall instead of the depth/render > > target flush+CS stall. > > This looked to be sufficient also on my SKL at least, note that this leaves > the CS stall bit out from compute tex cache invalidate. > > While I'm comfortable with the fix lining gen8+ with earlier gens (which is > according to the spec), this here looks more an optimization which should be > applied to all gens. It also requires more in-depth understanding of the > pipeline switch (which I don't think I have). > > I have pushed my patch already before I actually saw your response. This here > in my opinion requires some sort of comment telling why we did things > differently then described in the spec. I'd be happier if you followed up on > this instead of me. > > diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c > b/src/mesa/drivers/dri/i965/brw_misc_state.c > index 84f0c18..9704a9f 100644 > --- a/src/mesa/drivers/dri/i965/brw_misc_state.c > +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c > @@ -844,6 +844,13 @@ brw_emit_select_pipeline(struct brw_context *brw, enum > brw_pipeline pipeline) > brw->ctx.NewDriverState |= BRW_NEW_CC_STATE; > } > > + const unsigned dc_flush = brw->last_pipeline == BRW_COMPUTE_PIPELINE ? > + PIPE_CONTROL_DATA_CACHE_FLUSH : > + PIPE_CONTROL_RENDER_TARGET_FLUSH | > + PIPE_CONTROL_DEPTH_CACHE_FLUSH; > + > + brw_emit_pipe_control_flush(brw, dc_flush | PIPE_CONTROL_CS_STALL); > + > } else if (brw->gen >= 6) { > /* From "BXML <C2><BB> GT <C2><BB> MI <C2><BB> vol1a GPU Overview <C2> > <BB> [Instruction] > * PIPELINE_SELECT [DevBWR+]": > diff --git a/src/mesa/drivers/dri/i965/intel_fbo.c > b/src/mesa/drivers/dri/i965/intel_fbo.c > index de0cd6a..3126be6 100644 > --- a/src/mesa/drivers/dri/i965/intel_fbo.c > +++ b/src/mesa/drivers/dri/i965/intel_fbo.c > @@ -1082,10 +1082,16 @@ brw_render_cache_set_check_flush(struct brw_context > *brw, drm_intel_bo *bo) > return; > > if (brw->gen >= 6) { > - brw_emit_pipe_control_flush(brw, > - PIPE_CONTROL_DEPTH_CACHE_FLUSH | > - PIPE_CONTROL_RENDER_TARGET_FLUSH | > - PIPE_CONTROL_CS_STALL); > + if (brw->last_pipeline == BRW_COMPUTE_PIPELINE) { > + brw_emit_pipe_control_flush(brw, > + PIPE_CONTROL_DATA_CACHE_FLUSH | > + PIPE_CONTROL_CS_STALL); > + } else { > + brw_emit_pipe_control_flush(brw, > + PIPE_CONTROL_DEPTH_CACHE_FLUSH | > + PIPE_CONTROL_RENDER_TARGET_FLUSH | > + PIPE_CONTROL_CS_STALL); > + }
Flushing depth and rt in compute is of course wrong and this bit is genuine fix. I'll write a patch for it. I'll just need to study a little what the render cache means in compute context - initially I feel we shouldn't actually regard that at all in compute. > > brw_emit_pipe_control_flush(brw, > PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | > > > > > > On Thu, Mar 16, 2017 at 1:24 AM, Topi Pohjolainen < > > > topi.pohjolai...@gmail.com> wrote: > > > > > >> just as earlier gens do. > > >> > > >> CC: Jason Ekstrand <ja...@jlekstrand.net> > > >> Cc: "17.0 13.0" <mesa-sta...@lists.freedesktop.org> > > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96743 > > >> > > >> Signed-off-by: Topi Pohjolainen <topi.pohjolai...@intel.com> > > >> --- > > >> src/mesa/drivers/dri/i965/brw_misc_state.c | 3 ++- > > >> 1 file changed, 2 insertions(+), 1 deletion(-) > > >> > > >> diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c > > >> b/src/mesa/drivers/dri/i965/brw_misc_state.c > > >> index 84f0c18..1cf6b04 100644 > > >> --- a/src/mesa/drivers/dri/i965/brw_misc_state.c > > >> +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c > > >> @@ -843,8 +843,9 @@ brw_emit_select_pipeline(struct brw_context *brw, > > >> enum brw_pipeline pipeline) > > >> > > >> brw->ctx.NewDriverState |= BRW_NEW_CC_STATE; > > >> } > > >> + } > > >> > > >> - } else if (brw->gen >= 6) { > > >> + if (brw->gen >= 6) { > > >> /* From "BXML » GT » MI » vol1a GPU Overview » [Instruction] > > >> * PIPELINE_SELECT [DevBWR+]": > > >> * > > >> -- > > >> 2.9.3 > > >> > > >> > > > _______________________________________________ > > > mesa-stable mailing list > > > mesa-sta...@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/mesa-stable > > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev