Re: [Mesa-dev] [PATCH V3] i965 : Optimize atom state flag checks

2017-08-14 Thread Scott D Phillips
Tapani Pälli <tapani.pa...@intel.com> writes:

> Hello;
>
> On 07/25/2017 05:17 PM, Marathe, Yogesh wrote:
>> Hi Matt, Sorry for late reply, please see below.
>> 
>>> -Original Message-
>>> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf
>>> Of Matt Turner
>>> Sent: Saturday, July 22, 2017 12:12 AM
>>> To: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>
>>> Cc: mesa-dev@lists.freedesktop.org; Marathe, Yogesh
>>> <yogesh.mara...@intel.com>
>>> Subject: Re: [Mesa-dev] [PATCH V3] i965 : Optimize atom state flag checks
>>>
>>> On 07/21, aravindan.muthuku...@intel.com wrote:
>>>> From: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
>>>>
>>>> This patch improves CPI Rate(Cycles per Instruction) and branch miss
>>>> predict for i965. The function check_state() was showing CPI retired rate.
>>>>
>>>> Performance stats with android:
>>>> - CPI retired lowered by 28% (lower is better)
>>>> - Branch missprediction lowered by 13% (lower is better)
>>>> - 3DMark improved by 2%
>>>>
>>>> The dissassembly doesn't show difference, although above results were
>>>> observed with patch.
>>>
>> 
>> Yes this is true for V3 where we removed the function based on a review 
>> comment.
>> 
>>> If there's no difference in the assembly then whatever measurements you have
>>> taken must be noise.
>>>
>> 
>> No that's not guaranteed either. Lot of things still depend on how 
>> instructions are
>> aligned, sometimes even changing linking order gives different results where
>> disassemblies of individual functions remain same.
>>   
>>> I applied the patch and inspected brw_state_upload.o. There are assembly
>>> differences. I can produce the same assembly as this patch by just pulling 
>>> the if
>>> statement out of check_and_emit_atom() and into the caller. The replacement
>>> of check_state() with a macro is completely unnecessary.
>>>
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c
>>> b/src/mesa/drivers/dri/i965/brw_state_upload.c
>>> index acaa97ee7d..b163e1490d 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
>>> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
>>> @@ -439,14 +439,12 @@ merge_ctx_state(struct brw_context *brw,  }
>>>
>>>   static inline void
>>> -check_and_emit_atom(struct brw_context *brw,
>>> -struct brw_state_flags *state,
>>> -const struct brw_tracked_state *atom)
>>> +emit_atom(struct brw_context *brw,
>>> +  struct brw_state_flags *state,
>>> +  const struct brw_tracked_state *atom)
>>>   {
>>> -   if (check_state(state, >dirty)) {
>>> -  atom->emit(brw);
>>> -  merge_ctx_state(brw, state);
>>> -   }
>>> +   atom->emit(brw);
>>> +   merge_ctx_state(brw, state);
>>>   }
>>>
>>>   static inline void
>>> @@ -541,7 +539,9 @@ brw_upload_pipeline_state(struct brw_context *brw,
>>>  const struct brw_tracked_state *atom = [i];
>>>  struct brw_state_flags generated;
>>>
>>> - check_and_emit_atom(brw, , atom);
>>> + if (check_state(, >dirty)) {
>>> +emit_atom(brw, , atom);
>>> + }
>>>
>>>  accumulate_state(, >dirty);
>>>
>>> @@ -558,7 +558,9 @@ brw_upload_pipeline_state(struct brw_context *brw,
>>> for (i = 0; i < num_atoms; i++) {
>>>  const struct brw_tracked_state *atom = [i];
>>>
>>> - check_and_emit_atom(brw, , atom);
>>> + if (check_state(, >dirty)) {
>>> +emit_atom(brw, , atom);
>>> + }
>>> }
>>>  }
>>>
>>>
>>> With that said, the assembly differences are not understandable to me.
>>> Why should extracting an if statement from a static inline function into 
>>> the caller
>>> of that function cause any difference whatsoever?
>> 
>> Agreed, it shouldn't in case of static inline.
>> 
>>>
>>> I'm viewing the assembly differences with:
>>>
>>> wdiff -n \
>>>  -w $'\033[30;41m' -x $'\033[0m' \
>>>  -y $'\033[30;42m' -z $'\033[0m' \
>>>  <(objdump -d brw_state_upload.

Re: [Mesa-dev] [PATCH V3] i965 : Optimize atom state flag checks

2017-08-14 Thread Tapani Pälli

Hello;

On 07/25/2017 05:17 PM, Marathe, Yogesh wrote:

Hi Matt, Sorry for late reply, please see below.


-Original Message-
From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf
Of Matt Turner
Sent: Saturday, July 22, 2017 12:12 AM
To: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>
Cc: mesa-dev@lists.freedesktop.org; Marathe, Yogesh
<yogesh.mara...@intel.com>
Subject: Re: [Mesa-dev] [PATCH V3] i965 : Optimize atom state flag checks

On 07/21, aravindan.muthuku...@intel.com wrote:

From: Aravindan Muthukumar <aravindan.muthuku...@intel.com>

This patch improves CPI Rate(Cycles per Instruction) and branch miss
predict for i965. The function check_state() was showing CPI retired rate.

Performance stats with android:
- CPI retired lowered by 28% (lower is better)
- Branch missprediction lowered by 13% (lower is better)
- 3DMark improved by 2%

The dissassembly doesn't show difference, although above results were
observed with patch.




Yes this is true for V3 where we removed the function based on a review comment.


If there's no difference in the assembly then whatever measurements you have
taken must be noise.



No that's not guaranteed either. Lot of things still depend on how instructions 
are
aligned, sometimes even changing linking order gives different results where
disassemblies of individual functions remain same.
  

I applied the patch and inspected brw_state_upload.o. There are assembly
differences. I can produce the same assembly as this patch by just pulling the 
if
statement out of check_and_emit_atom() and into the caller. The replacement
of check_state() with a macro is completely unnecessary.


diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index acaa97ee7d..b163e1490d 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -439,14 +439,12 @@ merge_ctx_state(struct brw_context *brw,  }

  static inline void
-check_and_emit_atom(struct brw_context *brw,
-struct brw_state_flags *state,
-const struct brw_tracked_state *atom)
+emit_atom(struct brw_context *brw,
+  struct brw_state_flags *state,
+  const struct brw_tracked_state *atom)
  {
-   if (check_state(state, >dirty)) {
-  atom->emit(brw);
-  merge_ctx_state(brw, state);
-   }
+   atom->emit(brw);
+   merge_ctx_state(brw, state);
  }

  static inline void
@@ -541,7 +539,9 @@ brw_upload_pipeline_state(struct brw_context *brw,
 const struct brw_tracked_state *atom = [i];
 struct brw_state_flags generated;

- check_and_emit_atom(brw, , atom);
+ if (check_state(, >dirty)) {
+emit_atom(brw, , atom);
+ }

 accumulate_state(, >dirty);

@@ -558,7 +558,9 @@ brw_upload_pipeline_state(struct brw_context *brw,
for (i = 0; i < num_atoms; i++) {
 const struct brw_tracked_state *atom = [i];

- check_and_emit_atom(brw, , atom);
+ if (check_state(, >dirty)) {
+emit_atom(brw, , atom);
+ }
}
 }


With that said, the assembly differences are not understandable to me.
Why should extracting an if statement from a static inline function into the 
caller
of that function cause any difference whatsoever?


Agreed, it shouldn't in case of static inline.



I'm viewing the assembly differences with:

wdiff -n \
 -w $'\033[30;41m' -x $'\033[0m' \
 -y $'\033[30;42m' -z $'\033[0m' \
 <(objdump -d brw_state_upload.o.before | sed -e 's/^.*\t//') \
 <(objdump -d brw_state_upload.o.wtf| sed -e 's/^.*\t//') | less -R

and the only real difference is the movement of some call and jmp instructions.

I cannot take the patch without some reasonable explanation for the change.


Ok I think this has been discussed already and we agree that there is no big 
visible difference
in disassembly which can be pointed out for improvement. Although, as you said 
this movement
of instructions can cause this. If with this patch instructions get cache 
aligned that too can show
improvement.  This is a busy function with bad CPI. Hence chosen for 
optimization. Branch miss
predict is another metric.  Do we want to consider all these or just 
disassembly?


I don't have a reasonable explanation to give but I made some benchmark 
runs today (3DMark "Ice Storm Unlimited" test ) and I'm getting 
consistently 1-2% better results with the patch. I also tried the 
mentioned modification "pulling the if statement out of 
check_and_emit_atom() and into the caller" and it has same performance 
benefits. What I can tell from assembly dump is that 
brw_upload_render_state function becomes slightly shorter, there's 
movement with call and jmp and dump with the patch has less mov and nopl 
calls in total.




Let me make one more attempt, we clearly see icache misses for 
brw_upload

Re: [Mesa-dev] [PATCH V3] i965 : Optimize atom state flag checks

2017-07-25 Thread Marathe, Yogesh
Hi Matt, Sorry for late reply, please see below.

> -Original Message-
> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf
> Of Matt Turner
> Sent: Saturday, July 22, 2017 12:12 AM
> To: Muthukumar, Aravindan <aravindan.muthuku...@intel.com>
> Cc: mesa-dev@lists.freedesktop.org; Marathe, Yogesh
> <yogesh.mara...@intel.com>
> Subject: Re: [Mesa-dev] [PATCH V3] i965 : Optimize atom state flag checks
> 
> On 07/21, aravindan.muthuku...@intel.com wrote:
> >From: Aravindan Muthukumar <aravindan.muthuku...@intel.com>
> >
> >This patch improves CPI Rate(Cycles per Instruction) and branch miss
> >predict for i965. The function check_state() was showing CPI retired rate.
> >
> >Performance stats with android:
> >- CPI retired lowered by 28% (lower is better)
> >- Branch missprediction lowered by 13% (lower is better)
> >- 3DMark improved by 2%
> >
> >The dissassembly doesn't show difference, although above results were
> >observed with patch.
> 

Yes this is true for V3 where we removed the function based on a review comment.

> If there's no difference in the assembly then whatever measurements you have
> taken must be noise.
>

No that's not guaranteed either. Lot of things still depend on how instructions 
are
aligned, sometimes even changing linking order gives different results where
disassemblies of individual functions remain same.
 
> I applied the patch and inspected brw_state_upload.o. There are assembly
> differences. I can produce the same assembly as this patch by just pulling 
> the if
> statement out of check_and_emit_atom() and into the caller. The replacement
> of check_state() with a macro is completely unnecessary.
> 
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c
> b/src/mesa/drivers/dri/i965/brw_state_upload.c
> index acaa97ee7d..b163e1490d 100644
> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> @@ -439,14 +439,12 @@ merge_ctx_state(struct brw_context *brw,  }
> 
>  static inline void
> -check_and_emit_atom(struct brw_context *brw,
> -struct brw_state_flags *state,
> -const struct brw_tracked_state *atom)
> +emit_atom(struct brw_context *brw,
> +  struct brw_state_flags *state,
> +  const struct brw_tracked_state *atom)
>  {
> -   if (check_state(state, >dirty)) {
> -  atom->emit(brw);
> -  merge_ctx_state(brw, state);
> -   }
> +   atom->emit(brw);
> +   merge_ctx_state(brw, state);
>  }
> 
>  static inline void
> @@ -541,7 +539,9 @@ brw_upload_pipeline_state(struct brw_context *brw,
>const struct brw_tracked_state *atom = [i];
>struct brw_state_flags generated;
> 
> - check_and_emit_atom(brw, , atom);
> + if (check_state(, >dirty)) {
> +emit_atom(brw, , atom);
> + }
> 
>accumulate_state(, >dirty);
> 
> @@ -558,7 +558,9 @@ brw_upload_pipeline_state(struct brw_context *brw,
>for (i = 0; i < num_atoms; i++) {
>const struct brw_tracked_state *atom = [i];
> 
> - check_and_emit_atom(brw, , atom);
> + if (check_state(, >dirty)) {
> +emit_atom(brw, , atom);
> + }
>}
> }
> 
> 
> With that said, the assembly differences are not understandable to me.
> Why should extracting an if statement from a static inline function into the 
> caller
> of that function cause any difference whatsoever?

Agreed, it shouldn't in case of static inline.

> 
> I'm viewing the assembly differences with:
> 
> wdiff -n \
> -w $'\033[30;41m' -x $'\033[0m' \
> -y $'\033[30;42m' -z $'\033[0m' \
> <(objdump -d brw_state_upload.o.before | sed -e 's/^.*\t//') \
> <(objdump -d brw_state_upload.o.wtf| sed -e 's/^.*\t//') | less -R
> 
> and the only real difference is the movement of some call and jmp 
> instructions.
> 
> I cannot take the patch without some reasonable explanation for the change.

Ok I think this has been discussed already and we agree that there is no big 
visible difference
in disassembly which can be pointed out for improvement. Although, as you said 
this movement
of instructions can cause this. If with this patch instructions get cache 
aligned that too can show
improvement.  This is a busy function with bad CPI. Hence chosen for 
optimization. Branch miss
predict is another metric.  Do we want to consider all these or just 
disassembly?

Let me make one more attempt, we clearly see icache misses for 
brw_upload_pipeline_state also
reduced with patch against without patch. Do you think that too is noise with 
*.o.wtf?
I would be happy t

Re: [Mesa-dev] [PATCH V3] i965 : Optimize atom state flag checks

2017-07-21 Thread Matt Turner

On 07/21, aravindan.muthuku...@intel.com wrote:

From: Aravindan Muthukumar 

This patch improves CPI Rate(Cycles per Instruction) and branch miss predict
for i965. The function check_state() was showing CPI retired rate.

Performance stats with android:
- CPI retired lowered by 28% (lower is better)
- Branch missprediction lowered by 13% (lower is better)
- 3DMark improved by 2%

The dissassembly doesn't show difference, although above results were observed
with patch.


If there's no difference in the assembly then whatever measurements you
have taken must be noise.

I applied the patch and inspected brw_state_upload.o. There are assembly
differences. I can produce the same assembly as this patch by just
pulling the if statement out of check_and_emit_atom() and into the
caller. The replacement of check_state() with a macro is completely
unnecessary.


diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index acaa97ee7d..b163e1490d 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -439,14 +439,12 @@ merge_ctx_state(struct brw_context *brw,
}

static inline void
-check_and_emit_atom(struct brw_context *brw,
-struct brw_state_flags *state,
-const struct brw_tracked_state *atom)
+emit_atom(struct brw_context *brw,
+  struct brw_state_flags *state,
+  const struct brw_tracked_state *atom)
{
-   if (check_state(state, >dirty)) {
-  atom->emit(brw);
-  merge_ctx_state(brw, state);
-   }
+   atom->emit(brw);
+   merge_ctx_state(brw, state);
}

static inline void
@@ -541,7 +539,9 @@ brw_upload_pipeline_state(struct brw_context *brw,
 const struct brw_tracked_state *atom = [i];
 struct brw_state_flags generated;

- check_and_emit_atom(brw, , atom);
+ if (check_state(, >dirty)) {
+emit_atom(brw, , atom);
+ }

 accumulate_state(, >dirty);

@@ -558,7 +558,9 @@ brw_upload_pipeline_state(struct brw_context *brw,
  for (i = 0; i < num_atoms; i++) {
 const struct brw_tracked_state *atom = [i];

- check_and_emit_atom(brw, , atom);
+ if (check_state(, >dirty)) {
+emit_atom(brw, , atom);
+ }
  }
   }


With that said, the assembly differences are not understandable to me.
Why should extracting an if statement from a static inline function into
the caller of that function cause any difference whatsoever?

I'm viewing the assembly differences with:

wdiff -n \
   -w $'\033[30;41m' -x $'\033[0m' \
   -y $'\033[30;42m' -z $'\033[0m' \
   <(objdump -d brw_state_upload.o.before | sed -e 's/^.*\t//') \
   <(objdump -d brw_state_upload.o.wtf| sed -e 's/^.*\t//') | less -R

and the only real difference is the movement of some call and jmp
instructions.

I cannot take the patch without some reasonable explanation for the
change.


signature.asc
Description: Digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V3] i965 : Optimize atom state flag checks

2017-07-21 Thread aravindan . muthukumar
From: Aravindan Muthukumar 

This patch improves CPI Rate(Cycles per Instruction) and branch miss predict
for i965. The function check_state() was showing CPI retired rate.

Performance stats with android:
- CPI retired lowered by 28% (lower is better)
- Branch missprediction lowered by 13% (lower is better)
- 3DMark improved by 2%

The dissassembly doesn't show difference, although above results were observed
with patch.

V2:
- Removed memset() change
- Changed commit message as per review comments

V3:
- Indentation and changes to remove check_state as function

Signed-off-by: Aravindan Muthukumar 
Signed-off-by: Yogesh Marathe 
Tested-by: Asish 
---
 src/mesa/drivers/dri/i965/brw_defines.h  |  4 
 src/mesa/drivers/dri/i965/brw_state_upload.c | 24 +++-
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 2a8dbf8..8c9a510 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1687,3 +1687,7 @@ enum brw_pixel_shader_coverage_mask_mode {
 # define CSDBG2_CONSTANT_BUFFER_ADDRESS_OFFSET_DISABLE (1 << 4)
 
 #endif
+
+/* Checking the state of mesa and brw before emitting atoms */
+#define CHECK_BRW_STATE(a,b) ((a.mesa & b.mesa) | (a.brw & b.brw))
+
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index acaa97e..1846624 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -217,12 +217,6 @@ void brw_destroy_state( struct brw_context *brw )
 /***
  */
 
-static bool
-check_state(const struct brw_state_flags *a, const struct brw_state_flags *b)
-{
-   return ((a->mesa & b->mesa) | (a->brw & b->brw)) != 0;
-}
-
 static void accumulate_state( struct brw_state_flags *a,
  const struct brw_state_flags *b )
 {
@@ -443,10 +437,8 @@ check_and_emit_atom(struct brw_context *brw,
 struct brw_state_flags *state,
 const struct brw_tracked_state *atom)
 {
-   if (check_state(state, >dirty)) {
-  atom->emit(brw);
-  merge_ctx_state(brw, state);
-   }
+   atom->emit(brw);
+   merge_ctx_state(brw, state);
 }
 
 static inline void
@@ -541,7 +533,10 @@ brw_upload_pipeline_state(struct brw_context *brw,
 const struct brw_tracked_state *atom = [i];
 struct brw_state_flags generated;
 
- check_and_emit_atom(brw, , atom);
+ /* Checking the state and emitting atoms */
+ if (CHECK_BRW_STATE(state, atom->dirty)) {
+check_and_emit_atom(brw, , atom);
+ }
 
 accumulate_state(, >dirty);
 
@@ -550,7 +545,7 @@ brw_upload_pipeline_state(struct brw_context *brw,
  * fail;
  */
 xor_states(, , );
-assert(!check_state(, ));
+assert(!CHECK_BRW_STATE(examined, generated));
 prev = state;
   }
}
@@ -558,7 +553,10 @@ brw_upload_pipeline_state(struct brw_context *brw,
   for (i = 0; i < num_atoms; i++) {
 const struct brw_tracked_state *atom = [i];
 
- check_and_emit_atom(brw, , atom);
+ /* Checking the state and emitting atoms */
+ if (CHECK_BRW_STATE(state, atom->dirty)) {
+check_and_emit_atom(brw, , atom);
+ }
   }
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V3] i965 : Optimize atom state flag checks

2017-07-20 Thread Yogesh Marathe
From: Aravindan Muthukumar 

This patch improves CPI Rate(Cycles per Instruction) and branch miss 
predict for i965. The function check_state() was showing CPI retired
rate.

Performance stats with android:
- CPI retired lowered by 28% (lower is better)
- Branch missprediction lowered by 13% (lower is better)
- 3DMark improved by 2%

The dissassembly doesn't show difference, although above results were
observed with patch.

V2:
- Removed memset() change
- Changed commit message as per review comments

V3:
- Indentation in check_and_emit_atom and commit msg corrected

Signed-off-by: Aravindan Muthukumar 
Signed-off-by: Yogesh Marathe 
Tested-by: Asish 
---
 src/mesa/drivers/dri/i965/brw_defines.h  |  4 
 src/mesa/drivers/dri/i965/brw_state_upload.c | 16 ++--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 2a8dbf8..8c9a510 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1687,3 +1687,7 @@ enum brw_pixel_shader_coverage_mask_mode {
 # define CSDBG2_CONSTANT_BUFFER_ADDRESS_OFFSET_DISABLE (1 << 4)
 
 #endif
+
+/* Checking the state of mesa and brw before emitting atoms */
+#define CHECK_BRW_STATE(a,b) ((a.mesa & b.mesa) | (a.brw & b.brw))
+
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index acaa97e..57ac394 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -443,10 +443,8 @@ check_and_emit_atom(struct brw_context *brw,
 struct brw_state_flags *state,
 const struct brw_tracked_state *atom)
 {
-   if (check_state(state, >dirty)) {
-  atom->emit(brw);
-  merge_ctx_state(brw, state);
-   }
+   atom->emit(brw);
+   merge_ctx_state(brw, state);
 }
 
 static inline void
@@ -541,7 +539,10 @@ brw_upload_pipeline_state(struct brw_context *brw,
 const struct brw_tracked_state *atom = [i];
 struct brw_state_flags generated;
 
- check_and_emit_atom(brw, , atom);
+ /* Checking the state and emitting atoms */
+ if (CHECK_BRW_STATE(state, atom->dirty)) {
+check_and_emit_atom(brw, , atom);
+ }
 
 accumulate_state(, >dirty);
 
@@ -558,7 +559,10 @@ brw_upload_pipeline_state(struct brw_context *brw,
   for (i = 0; i < num_atoms; i++) {
 const struct brw_tracked_state *atom = [i];
 
- check_and_emit_atom(brw, , atom);
+ /* Checking the state and emitting atoms */
+ if (CHECK_BRW_STATE(state, atom->dirty)) {
+check_and_emit_atom(brw, , atom);
+ }
   }
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev