On Thursday, July 13, 2017 9:09:09 PM PDT aravindan.muthuku...@intel.com wrote:
> From: Aravindan M <aravindan.muthuku...@intel.com>

The commit title should be something like, "i965: Optimize atom state
flag checks" rather than a generic "Performance Improvement"

> This patch improves CPI Rate(Cycles per Instruction)
> and CPU time utilization for i965. The functions
> check_state and brw_pipeline_state_finished was found
> poor CPU utilization from performance analysis.

Need actual data here, or show assembly quality improvements.

> Change-Id: I17c7e719a16e222764217a0e67b4482748537b67
> Signed-off-by: Aravindan M <aravindan.muthuku...@intel.com>
> Reviewed-by: Yogesh M <yogesh.mara...@intel.com>
> Tested-by: Asish <as...@intel.com>
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h      |  3 +++
>  src/mesa/drivers/dri/i965/brw_state_upload.c | 14 +++++++++++---
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index a4794c6..60f88ca 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -1681,3 +1681,6 @@ enum brw_pixel_shader_coverage_mask_mode {
>  # define GEN8_L3CNTLREG_ALL_ALLOC_MASK     INTEL_MASK(31, 25)
>  
>  #endif
> +
> +/* Checking the state of mesa and brw before emitting atoms */
> +#define CHECK_BRW_STATE(a,b) ((a.mesa & b.mesa) | (a.brw & b.brw))
> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
> b/src/mesa/drivers/dri/i965/brw_state_upload.c
> index 5e82c1b..434decf 100644
> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> @@ -515,7 +515,10 @@ brw_upload_pipeline_state(struct brw_context *brw,
>        const struct brw_tracked_state *atom = &atoms[i];
>        struct brw_state_flags generated;
>  
> -         check_and_emit_atom(brw, &state, atom);
> +         /* Checking the state and emitting the atoms */
> +         if (CHECK_BRW_STATE(state, atom->dirty)) {
> +            check_and_emit_atom(brw, &state, atom);
> +         }
>  
>        accumulate_state(&examined, &atom->dirty);
>  
> @@ -532,7 +535,10 @@ brw_upload_pipeline_state(struct brw_context *brw,
>        for (i = 0; i < num_atoms; i++) {
>        const struct brw_tracked_state *atom = &atoms[i];
>  
> -         check_and_emit_atom(brw, &state, atom);
> +         /* Checking the state and emitting the atoms */
> +         if (CHECK_BRW_STATE(state, atom->dirty)) {
> +            check_and_emit_atom(brw, &state, atom);
> +         }

This doesn't make any sense...the very first thing check_and_emit_atom()
does is call check_state(), which does the exact thing your CHECK_BRW_STATE
macro does.  So you're just needlessly checking the same thing twice.

The only reason I could see this helping is if check_state() wasn't inlined,
but a release build with -O2 definitely inlines both check_and_emit_atom()
and check_state().

Are you using GCC?  What are your CFLAGS?  -O2?  I hope you're not trying
to optimize a debug build...

>        }
>     }
>  
> @@ -567,7 +573,9 @@ brw_pipeline_state_finished(struct brw_context *brw,
>           brw->state.pipelines[i].mesa |= brw->NewGLState;
>           brw->state.pipelines[i].brw |= brw->ctx.NewDriverState;
>        } else {
> -         memset(&brw->state.pipelines[i], 0, sizeof(struct brw_state_flags));
> +         /* Avoiding the memset with initialization */
> +         brw->state.pipelines[i].mesa = 0;
> +         brw->state.pipelines[i].brw = 0ull;
>        }
>     }

This is a separate change.

I'm also not seeing why it's useful:

Assembly before (GCC 7.1.1, x86_64, -O2 -fno-omit-frame-pointer):

00000000003e0380 <brw_render_state_finished>:
  3e0380:       66 0f ef c0             pxor   %xmm0,%xmm0
  3e0384:       55                      push   %rbp
  3e0385:       8b 87 20 52 02 00       mov    0x25220(%rdi),%eax
  3e038b:       c7 87 20 52 02 00 00    movl   $0x0,0x25220(%rdi)
  3e0392:       00 00 00 
  3e0395:       48 89 e5                mov    %rsp,%rbp
  3e0398:       09 87 38 52 02 00       or     %eax,0x25238(%rdi)
  3e039e:       48 8b 87 38 4d 02 00    mov    0x24d38(%rdi),%rax
  3e03a5:       5d                      pop    %rbp
  3e03a6:       48 09 87 40 52 02 00    or     %rax,0x25240(%rdi)
  3e03ad:       48 c7 87 38 4d 02 00    movq   $0x0,0x24d38(%rdi)
  3e03b4:       00 00 00 00 
  3e03b8:       0f 11 87 28 52 02 00    movups %xmm0,0x25228(%rdi)
  3e03bf:       c3                      retq 

Assembly after:

00000000003e0380 <brw_render_state_finished>:
  3e0380:       55                      push   %rbp
  3e0381:       8b 87 20 52 02 00       mov    0x25220(%rdi),%eax
  3e0387:       c7 87 28 52 02 00 00    movl   $0x0,0x25228(%rdi)
  3e038e:       00 00 00 
  3e0391:       09 87 38 52 02 00       or     %eax,0x25238(%rdi)
  3e0397:       48 89 e5                mov    %rsp,%rbp
  3e039a:       48 8b 87 38 4d 02 00    mov    0x24d38(%rdi),%rax
  3e03a1:       48 c7 87 30 52 02 00    movq   $0x0,0x25230(%rdi)
  3e03a8:       00 00 00 00 
  3e03ac:       48 09 87 40 52 02 00    or     %rax,0x25240(%rdi)
  3e03b3:       c7 87 20 52 02 00 00    movl   $0x0,0x25220(%rdi)
  3e03ba:       00 00 00 
  3e03bd:       48 c7 87 38 4d 02 00    movq   $0x0,0x24d38(%rdi)
  3e03c4:       00 00 00 00 
  3e03c8:       5d                      pop    %rbp
  3e03c9:       c3                      retq   
  3e03ca:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

That looks like more instructions to me...

Thanks for looking at ways to optimize our CPU overhead.  I'm sure there
are some improvements we can make...

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to