Re: [Mesa-dev] [OT] some contribution statistics

2015-12-17 Thread Giuseppe Bilotta
On Thu, Dec 17, 2015 at 4:57 AM, Kenneth Graunke  wrote:
>
> Absolutely.  Feel free to send them to me personally.  Or, I could
> include the files upstream if that's of interest to people, and then
> people could just update stuff themselves.
>
> I just threw together a best-effort set of configuration files so I
> could run statistics on Mesa and Piglit over the last few years.

As a lover of statistics, I think it would be worth to merge it
upstream. Does git dm make use of the .mailmap if present? If so, some
of the overlapping information could be removed (since .mailmap is
used by other git tools too, but the dm stuff is not).


-- 
Giuseppe "Oblomov" Bilotta
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] Add .mailmap

2015-12-17 Thread Giuseppe Bilotta
This adds a first tentative .mailmap file, to canonicize contributor
name/emails in shortlogs and other statistical endeavours.

Signed-off-by: Giuseppe Bilotta 
---
 .mailmap | 462 +++
 1 file changed, 462 insertions(+)
 create mode 100644 .mailmap

diff --git a/.mailmap b/.mailmap
new file mode 100644
index 000..827bfde
--- /dev/null
+++ b/.mailmap
@@ -0,0 +1,462 @@
+Aapo Tahkola  
+
+Adam Jackson  
+Adam Jackson  
+
+Adrian Marius Negreanu  Adrian Negreanu 

+Adrian Marius Negreanu  Negreanu Marius Adrian 

+
+Dave Airlie  
+Dave Airlie  airlied 
+Dave Airlie  
+Dave Airlie  
+Dave Airlie  
+Dave Airlie  
+Dave Airlie  
+Dave Airlie  
+Dave Airlie  
+
+Alan Coopersmith  
+
+Alan Hourihane  
+Alan Hourihane  
+Alan Hourihane  
+
+Alexander Monakov  
+
+Alexander von Gluck IV  Alexander von Gluck 

+
+Alex Corscadden  
+Alex Corscadden  
+
+Alex Deucher  
+Alex Deucher  
+Alex Deucher  
+Alex Deucher  
+Alex Deucher  
+Alex Deucher  
+
+Andreas Fänger  
+
+Andreas Hartmetz  
+
+Andre Heider 
+Andreas Heider 
+
+Andreas Pokorny  

+
+Andrew Randrianasulu  
+Andrew Randrianasulu  
+
+Arthur Huillet  Arthur HUILLET 
+
+Benjamin Franzke  ben 

+
+Ben Skeggs  
+Ben Skeggs  
+Ben Skeggs  
+Ben Skeggs  
+Ben Skeggs  
+Ben Skeggs  
+Ben Skeggs  
+
+Ben Widawsky  Ben Widawsky 
+
+Blair Sadewitz  Blair Sadewitz 

+
+Brian Paul  Brian 
+Brian Paul  
+Brian Paul  
+Brian Paul  
+Brian Paul  brian 
+Brian Paul  Brian 
+Brian Paul  Brian 
+Brian Paul  Brian 
+Brian Paul  Brian 
+Brian Paul  Brian 
+Brian Paul  Brian 
+Brian Paul  root 
+# The next ones are not 100% sure
+Brian Paul  root 
+Brian Paul  root 
+Brian Paul  root 
+
+Bruce Merry  
+
+Carl-Philip Hänsch  Carl-Philip Haensch 

+Carl-Philip Hänsch  Carl-Philip Haensch 

+Carl-Philip Hänsch  Carl-Philip Haensch 

+
+Chad Versace  
+Chad Versace  
+Chad Versace  

[Mesa-dev] [Bug 80183] [llvmpipe] triangles with vertices that map to raster positions > viewport width/height are not displayed

2015-12-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=80183

--- Comment #18 from cgerlac...@gmail.com ---
You were right, the latest mesa version fixes this problem :-)

Latest git hash from my checkout is:
e97b207654a1c0b2c27b6ad6579b5570a83ce8cd

I also tried to replay an apitrace generated with hardware rendering, but I
can't get the glretrace to use software rendering. It always uses the systems
opengl32.dll and therefore the clipping problem is not reproducible.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965/gen8/cs: Gen8 requires 64 byte alignment for push constant data

2015-12-17 Thread Iago Toral
On Wed, 2015-12-16 at 11:39 -0800, Kenneth Graunke wrote:
> On Wednesday, December 16, 2015 10:02:16 AM Iago Toral Quiroga wrote:
> > The BDW PRM Vol2a: Command Reference: Instructions, section 
> > MEDIA_CURBE_LOAD,
> > says that 'CURBE Total Data Length' and 'CURBE Data Start Address' are
> > 64-byte aligned. This is different from previous gens, that were 32-byte
> > aligned.
> > 
> > v2 (Jordan):
> >   - CURBE Data Start Address is also 64-byte aligned.
> >   - The call to brw_state_batch should also use 64-byte alignment.
> >   - Improve PRM reference.
> > 
> > Fixes the following SSBO CTS tests on BDW:
> > ES31-CTS.shader_storage_buffer_object.basic-atomic-case1-cs
> > ES31-CTS.shader_storage_buffer_object.basic-operations-case1-cs
> > ES31-CTS.shader_storage_buffer_object.basic-operations-case2-cs
> > ES31-CTS.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case2-cs
> > ES31-CTS.shader_storage_buffer_object.advanced-write-fragment-cs
> > ES31-CTS.shader_storage_buffer_object.advanced-indirectAddressing-case2-cs
> > ES31-CTS.shader_storage_buffer_object.advanced-matrix-cs
> > 
> > And many other CS CTS tests as reported by Marta Lofstedt.
> > ---
> >  src/mesa/drivers/dri/i965/gen7_cs_state.c | 12 
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c 
> > b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > index 1fde69c..df0f301 100644
> > --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > @@ -68,7 +68,7 @@ brw_upload_cs_state(struct brw_context *brw)
> >  
> > uint32_t *bind = (uint32_t*) brw_state_batch(brw, 
> > AUB_TRACE_BINDING_TABLE,
> >  
> > prog_data->binding_table.size_bytes,
> > -32, 
> > _state->bind_bo_offset);
> > +64, 
> > _state->bind_bo_offset);
> 
> I don't understand this hunk - binding tables don't have anything to do
> with push constants.  These are for pull constants and UBOs.  At least
> in the 3D pipeline, we only align these to 32B, not 64.
> 
> > unsigned local_id_dwords = 0;
> >  
> > @@ -77,7 +77,8 @@ brw_upload_cs_state(struct brw_context *brw)
> >  
> > unsigned push_constant_data_size =
> >(prog_data->nr_params + local_id_dwords) * sizeof(gl_constant_value);
> > -   unsigned reg_aligned_constant_size = ALIGN(push_constant_data_size, 32);
> > +   unsigned reg_aligned_constant_size =
> > +  ALIGN(push_constant_data_size, brw->gen < 8 ? 32 : 64);
> > unsigned push_constant_regs = reg_aligned_constant_size / 32;
> > unsigned threads = get_cs_thread_count(cs_prog_data);
> >  
> > @@ -138,11 +139,13 @@ brw_upload_cs_state(struct brw_context *brw)
> > ADVANCE_BATCH();
> >  
> > if (reg_aligned_constant_size > 0) {
> > +  const unsigned aligned_push_const_offset =
> > + ALIGN(stage_state->push_const_offset, brw->gen < 8 ? 32 : 64);
> 
> This is wrong.  What you want is to change:
> 
>   param = (gl_constant_value*)
>  brw_state_batch(brw, type,
>  reg_aligned_constant_size * threads,
>  32, _state->push_const_offset);
> 
> to use an alignment of 64 instead of 32 on Gen8+.  That way, it'll
> actually upload the data to a portion of the buffer that starts on
> a 64B aligned boundary.
> 
> As is, you're uploading the data to a 32B aligned section and then
> just fudging the pointer to be 64B aligned, possibly skipping over
> the first 32B.  Probably not what you wanted :)
> 
> Maybe you accidentally changed the wrong brw_state_batch call?

Ouch! yeah, I meant to change the other call, sorry about that. Anyway,
I think the patch proposed by Jordan is good so I won't send another
version.

Thanks for the review Ken!

Iago

> >BEGIN_BATCH(4);
> >OUT_BATCH(MEDIA_CURBE_LOAD << 16 | (4 - 2));
> >OUT_BATCH(0);
> >OUT_BATCH(reg_aligned_constant_size * threads);
> > -  OUT_BATCH(stage_state->push_const_offset);
> > +  OUT_BATCH(aligned_push_const_offset);
> >ADVANCE_BATCH();
> > }
> >  
> > @@ -241,7 +244,8 @@ brw_upload_cs_push_constants(struct brw_context *brw,
> >  
> >const unsigned push_constant_data_size =
> >   (local_id_dwords + prog_data->nr_params) * 
> > sizeof(gl_constant_value);
> > -  const unsigned reg_aligned_constant_size = 
> > ALIGN(push_constant_data_size, 32);
> > +  const unsigned reg_aligned_constant_size =
> > + ALIGN(push_constant_data_size, brw->gen < 8 ? 32 : 64);
> >const unsigned param_aligned_count =
> >   reg_aligned_constant_size / sizeof(*param);
> >  
> > 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965/gen8/cs: Gen8 requires 64 byte alignment for push constant data

2015-12-17 Thread Iago Toral
On Wed, 2015-12-16 at 14:48 -0800, Jordan Justen wrote:
> On 2015-12-16 11:39:00, Kenneth Graunke wrote:
> > On Wednesday, December 16, 2015 10:02:16 AM Iago Toral Quiroga wrote:
> > > The BDW PRM Vol2a: Command Reference: Instructions, section 
> > > MEDIA_CURBE_LOAD,
> > > says that 'CURBE Total Data Length' and 'CURBE Data Start Address' are
> > > 64-byte aligned. This is different from previous gens, that were 32-byte
> > > aligned.
> > > 
> > > v2 (Jordan):
> > >   - CURBE Data Start Address is also 64-byte aligned.
> > >   - The call to brw_state_batch should also use 64-byte alignment.
> > >   - Improve PRM reference.
> > > 
> > > Fixes the following SSBO CTS tests on BDW:
> > > ES31-CTS.shader_storage_buffer_object.basic-atomic-case1-cs
> > > ES31-CTS.shader_storage_buffer_object.basic-operations-case1-cs
> > > ES31-CTS.shader_storage_buffer_object.basic-operations-case2-cs
> > > ES31-CTS.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case2-cs
> > > ES31-CTS.shader_storage_buffer_object.advanced-write-fragment-cs
> > > ES31-CTS.shader_storage_buffer_object.advanced-indirectAddressing-case2-cs
> > > ES31-CTS.shader_storage_buffer_object.advanced-matrix-cs
> > > 
> > > And many other CS CTS tests as reported by Marta Lofstedt.
> > > ---
> > >  src/mesa/drivers/dri/i965/gen7_cs_state.c | 12 
> > >  1 file changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c 
> > > b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > > index 1fde69c..df0f301 100644
> > > --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > > +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > > @@ -68,7 +68,7 @@ brw_upload_cs_state(struct brw_context *brw)
> > >  
> > > uint32_t *bind = (uint32_t*) brw_state_batch(brw, 
> > > AUB_TRACE_BINDING_TABLE,
> > >  
> > > prog_data->binding_table.size_bytes,
> > > -32, 
> > > _state->bind_bo_offset);
> > > +64, 
> > > _state->bind_bo_offset);
> > 
> > I don't understand this hunk - binding tables don't have anything to do
> > with push constants.  These are for pull constants and UBOs.  At least
> > in the 3D pipeline, we only align these to 32B, not 64.
> 
> Yeah. I think he wants to update the call you pointed out below in
> brw_upload_cs_push_constants.
> 
> Also, how about consistently applying the alignment change? Either,
> just bump the base and size alignment to 64, or also check the gen to
> align the base to 32 on gen7.
> 
> How about the attached patch?

Yeah, that looks simpler. The patch is:

Tested-by: Iago Toral Quiroga 
Reviewed-by: Iago Toral Quiroga 

Thanks Jordan!

> -Jordan
> 
> > > unsigned local_id_dwords = 0;
> > >  
> > > @@ -77,7 +77,8 @@ brw_upload_cs_state(struct brw_context *brw)
> > >  
> > > unsigned push_constant_data_size =
> > >(prog_data->nr_params + local_id_dwords) * 
> > > sizeof(gl_constant_value);
> > > -   unsigned reg_aligned_constant_size = ALIGN(push_constant_data_size, 
> > > 32);
> > > +   unsigned reg_aligned_constant_size =
> > > +  ALIGN(push_constant_data_size, brw->gen < 8 ? 32 : 64);
> > > unsigned push_constant_regs = reg_aligned_constant_size / 32;
> > > unsigned threads = get_cs_thread_count(cs_prog_data);
> > >  
> > > @@ -138,11 +139,13 @@ brw_upload_cs_state(struct brw_context *brw)
> > > ADVANCE_BATCH();
> > >  
> > > if (reg_aligned_constant_size > 0) {
> > > +  const unsigned aligned_push_const_offset =
> > > + ALIGN(stage_state->push_const_offset, brw->gen < 8 ? 32 : 64);
> > 
> > This is wrong.  What you want is to change:
> > 
> >   param = (gl_constant_value*)
> >  brw_state_batch(brw, type,
> >  reg_aligned_constant_size * threads,
> >  32, _state->push_const_offset);
> > 
> > to use an alignment of 64 instead of 32 on Gen8+.  That way, it'll
> > actually upload the data to a portion of the buffer that starts on
> > a 64B aligned boundary.
> > 
> > As is, you're uploading the data to a 32B aligned section and then
> > just fudging the pointer to be 64B aligned, possibly skipping over
> > the first 32B.  Probably not what you wanted :)
> > 
> > Maybe you accidentally changed the wrong brw_state_batch call?
> > 
> > >BEGIN_BATCH(4);
> > >OUT_BATCH(MEDIA_CURBE_LOAD << 16 | (4 - 2));
> > >OUT_BATCH(0);
> > >OUT_BATCH(reg_aligned_constant_size * threads);
> > > -  OUT_BATCH(stage_state->push_const_offset);
> > > +  OUT_BATCH(aligned_push_const_offset);
> > >ADVANCE_BATCH();
> > > }
> > >  
> > > @@ -241,7 +244,8 @@ brw_upload_cs_push_constants(struct brw_context *brw,
> > >  
> > >const unsigned push_constant_data_size =
> > >   (local_id_dwords + prog_data->nr_params) * 
> > > sizeof(gl_constant_value);
> > > -  

Re: [Mesa-dev] [PATCH 6/7] nir: Teach nir_opt_algebraic about adding and subtracting the same thing

2015-12-17 Thread Eero Tamminen

Hi,

On 12/17/2015 01:52 AM, Matt Turner wrote:

On Tue, Dec 15, 2015 at 1:16 AM, Eduardo Lima Mitev  wrote:

On 12/15/2015 09:28 AM, Kristian Høgsberg Kristensen wrote:

This optimizes a + b - b to just a. Modest shader-db results (BDW):

   total instructions in shared programs: 7842452 -> 7841862 (-0.01%)
   instructions in affected programs: 61938 -> 61348 (-0.95%)
   total loops in shared programs:2131 -> 2131 (0.00%)
   helped:263
   HURT:  0
   GAINED:0
   LOST:  0


In HSW, I get these shader-db results:

total instructions in shared programs: 6257265 -> 6256788 (-0.01%)
instructions in affected programs: 46601 -> 46124 (-1.02%)
helped: 218
HURT: 0

total cycles in shared programs: 56010026 -> 56007760 (-0.00%)
cycles in affected programs: 1048392 -> 1046126 (-0.22%)
helped: 199
HURT: 154

total loops in shared programs: 1979 -> 1979 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   0
GAINED: 0

I wonder where those cycle HURTs come from. In any case, the net result
is positive.


I haven't confirmed, but I've seen cases that seem like the cycle
counts are wrong.


I have doubts about the correctness of latency values set in 
brw_schedule_instructions.cpp.


They were added mostly by Eric on 2012 & 2013.  You added mad & lrp data 
in 2013 and Curro untyped atomics & surface reads in 2013.  Both of them 
have is_haswell check, but don't say anything about newer generations.


It seems that some of the values are from spec and some from tests. 
However, for the test data, the code doesn't say on what exact HW and 
stepping the tests were run on.  Or where the sources for those tests 
are so that one could try to reproduce the results, verify (with perf 
counters) that they actually are bound by what the test says, and update 
data gotten from them for newer generations (i.e. GEN8+).


In addition to this, Mesa is lacking at least stall cycles for 3src 
register bank conflicts.



- Eero

PS. cycle values are anyway going to be off, code doesn't know memory 
latencies as that depends on locality & cache utilization, and it 
doesn't take threading into account.  But it only tries to schedule 
things so that HW is able to better compensate latency, so it doesn't 
need to know how much cycles take, just have good enough estimate. :-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] Add .mailmap

2015-12-17 Thread Erik Faye-Lund
On Thu, Dec 17, 2015 at 10:09 AM, Giuseppe Bilotta
 wrote:
> +# missing svn authors:

I'm not sure this is useful, but just for kicks, I decided to try to
track down these contributors, and this is what I came up with
(suspected contributors CC'ed, so they can confirm or deny if they
want):

> +pesco 

Probably "Sven M. Hallberg "
(Sources: http://sourceforge.net/p/mesa3d/mailman/message/5416179/ and
https://github.com/jwiegley/lambdabot-1/blob/master/AUTHORS#L17)

> +tanner 

Probably "Thomas Tanner "
(Source: https://www.mail-archive.com/mesa-dev@mesa3d.org/msg00826.html)

> +hmarson 

Probably "Hamish Marson "
(Sources: http://sourceforge.net/p/r300/mailman/message/2172921/ and
http://comments.gmane.org/gmane.comp.video.dri.devel/19600)

> +reist 

Probably "Boris Peterbarg "
(Sources: 
http://sourceforge.net/p/r300/mailman/r300-commit/?style=threaded=200502=16
and 
http://www.bugzilla.icculus.org/projects/beagle-avahi/Filters/entagged-sharp/Tracker/Util/TrackerTagReader.cs)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] Add .mailmap

2015-12-17 Thread Erik Faye-Lund
On Thu, Dec 17, 2015 at 10:39 AM, Erik Faye-Lund  wrote:
> On Thu, Dec 17, 2015 at 10:09 AM, Giuseppe Bilotta
>  wrote:
>> +# missing svn authors:
>
> I'm not sure this is useful, but just for kicks, I decided to try to
> track down these contributors, and this is what I came up with
> (suspected contributors CC'ed, so they can confirm or deny if they
> want):
>
>> +pesco 
>
> Probably "Sven M. Hallberg "
> (Sources: http://sourceforge.net/p/mesa3d/mailman/message/5416179/ and
> https://github.com/jwiegley/lambdabot-1/blob/master/AUTHORS#L17)

OK, this address bounced (or rather, mailer-dae...@kundenserver.de
buonced pe...@gmx.de, so I guess it's forwarded to a dead address)

However, his homepage (http://khjk.org/~pesco/) suggests that "Sven M.
Hallberg " might be an up-to-date e-mail address.
CC'ed.

>> +reist 
>
> Probably "Boris Peterbarg "
> (Sources: 
> http://sourceforge.net/p/r300/mailman/r300-commit/?style=threaded=200502=16
> and 
> http://www.bugzilla.icculus.org/projects/beagle-avahi/Filters/entagged-sharp/Tracker/Util/TrackerTagReader.cs)

This one also bounced, but it seems he committed to Rails
(https://github.com/rails/rails/pull/19351) as "Boris Peterbarg
". Again, CC'ed.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 80183] [llvmpipe] triangles with vertices that map to raster positions > viewport width/height are not displayed

2015-12-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=80183

--- Comment #19 from Jose Fonseca  ---
(In reply to cgerlach42 from comment #18)
> I also tried to replay an apitrace generated with hardware rendering, but I
> can't get the glretrace to use software rendering. It always uses the
> systems opengl32.dll and therefore the clipping problem is not reproducible.

If you put Mesa opengl32.dll into the same dir as glretrace.exe it should work.

Alternatively, you can do


  set TRACE_LIBGL=C:\path\to\desired\opengl32.dll
  glretrace foo.trace

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/9] i965/gs/gen6: fix execsize for instructions with width of 4 in gen6_sol_program()

2015-12-17 Thread Samuel Iglesias Gonsálvez
From: Samuel Iglesias Gonsalvez 

Signed-off-by: Samuel Iglesias Gonsalvez 
---
 src/mesa/drivers/dri/i965/brw_ff_gs_emit.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_ff_gs_emit.c 
b/src/mesa/drivers/dri/i965/brw_ff_gs_emit.c
index 8589dab..3ac1c48 100644
--- a/src/mesa/drivers/dri/i965/brw_ff_gs_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_ff_gs_emit.c
@@ -406,9 +406,11 @@ gen6_sol_program(struct brw_ff_gs_compile *c, struct 
brw_ff_gs_prog_key *key,
 : 0x00020001)); /* (1, 0, 2) */
  brw_inst_set_pred_control(p->devinfo, inst, BRW_PREDICATE_NORMAL);
   }
+  brw_push_insn_state(p);
+  brw_set_default_exec_size(p, BRW_EXECUTE_4);
   brw_ADD(p, c->reg.destination_indices,
   c->reg.destination_indices, get_element_ud(c->reg.SVBI, 0));
-
+  brw_pop_insn_state(p);
   /* For each vertex, generate code to output each varying using the
* appropriate binding table entry.
*/
@@ -438,8 +440,13 @@ gen6_sol_program(struct brw_ff_gs_compile *c, struct 
brw_ff_gs_prog_key *key,
 vertex_slot.swizzle = varying == VARYING_SLOT_PSIZ
? BRW_SWIZZLE_ : key->transform_feedback_swizzles[binding];
 brw_set_default_access_mode(p, BRW_ALIGN_16);
+brw_push_insn_state(p);
+brw_set_default_exec_size(p, BRW_EXECUTE_4);
+
 brw_MOV(p, stride(c->reg.header, 4, 4, 1),
 retype(vertex_slot, BRW_REGISTER_TYPE_UD));
+brw_pop_insn_state(p);
+
 brw_set_default_access_mode(p, BRW_ALIGN_1);
 brw_svb_write(p,
   final_write ? c->reg.temp : brw_null_reg(), /* dest 
*/
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/9] i965/eu: set execution size for SEND message in brw_send_indirect_message

2015-12-17 Thread Samuel Iglesias Gonsálvez
From: Iago Toral Quiroga 

---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 9a3b0a2..3d24010 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2562,6 +2562,9 @@ brw_send_indirect_message(struct brw_codegen *p,
   brw_set_src1(p, send, addr);
}
 
+   if (dst.width < BRW_EXECUTE_8)
+  brw_inst_set_exec_size(devinfo, send, dst.width);
+
brw_set_dest(p, send, dst);
brw_set_src0(p, send, retype(payload, BRW_REGISTER_TYPE_UD));
brw_inst_set_sfid(devinfo, send, sfid);
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] i965/eu: set correct execution size in brw_NOP

2015-12-17 Thread Samuel Iglesias Gonsálvez
From: Iago Toral Quiroga 

v2: NOP should have an execsize of 1 (Matt)
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 5fb9662..9a3b0a2 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -1230,8 +1230,9 @@ brw_F16TO32(struct brw_codegen *p, struct brw_reg dst, 
struct brw_reg src)
 void brw_NOP(struct brw_codegen *p)
 {
brw_inst *insn = next_insn(p, BRW_OPCODE_NOP);
-   brw_set_dest(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
-   brw_set_src0(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD));
+   brw_inst_set_exec_size(p->devinfo, insn, BRW_EXECUTE_1);
+   brw_set_dest(p, insn, retype(brw_vec1_grf(0,0), BRW_REGISTER_TYPE_UD));
+   brw_set_src0(p, insn, retype(brw_vec1_grf(0,0), BRW_REGISTER_TYPE_UD));
brw_set_src1(p, insn, brw_imm_ud(0x0));
 }
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] i965/vec4/gen6: fix exec_size for instructions with destination width of 4

2015-12-17 Thread Samuel Iglesias Gonsálvez
From: Samuel Iglesias Gonsalvez 

Signed-off-by: Samuel Iglesias Gonsalvez 
---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index e3f9eea..e2a203a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1092,6 +1092,7 @@ generate_code(struct brw_codegen *p,
   assert(inst->mlen <= BRW_MAX_MSG_LENGTH);
 
   unsigned pre_emit_nr_insn = p->nr_insn;
+  bool fix_exec_size = false;
 
   if (dst.width == BRW_WIDTH_4) {
  /* This happens in attribute fixups for "dual instanced" geometry
@@ -1116,6 +1117,8 @@ generate_code(struct brw_codegen *p,
 if (src[i].file == BRW_GENERAL_REGISTER_FILE)
src[i] = stride(src[i], 4, 4, 1);
  }
+ brw_set_default_exec_size(p, BRW_EXECUTE_4);
+ fix_exec_size = true;
   }
 
   switch (inst->opcode) {
@@ -1545,6 +1548,9 @@ generate_code(struct brw_codegen *p,
  unreachable("Unsupported opcode");
   }
 
+  if (fix_exec_size)
+ brw_set_default_exec_size(p, BRW_EXECUTE_8);
+
   if (inst->opcode == VEC4_OPCODE_PACK_BYTES) {
  /* Handled dependency hints in the generator. */
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] i965: set correct execsize for MOVS with a width of 4 in brw_find_live_channel

2015-12-17 Thread Samuel Iglesias Gonsálvez
From: Iago Toral Quiroga 

---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 3d24010..88e2f71 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -3332,11 +3332,14 @@ brw_find_live_channel(struct brw_codegen *p, struct 
brw_reg dst)
  /* Overwrite the destination without and with execution masking to
   * find out which of the channels is active.
   */
+ brw_push_insn_state(p);
+ brw_set_default_exec_size(p, BRW_EXECUTE_4);
  brw_MOV(p, brw_writemask(vec4(dst), WRITEMASK_X),
  brw_imm_ud(1));
 
  inst = brw_MOV(p, brw_writemask(vec4(dst), WRITEMASK_X),
 brw_imm_ud(0));
+ brw_pop_insn_state(p);
  brw_inst_set_mask_control(devinfo, inst, BRW_MASK_ENABLE);
   }
}
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 9/9] i965: Skip execution size adjustment for instructions of width 4

2015-12-17 Thread Samuel Iglesias Gonsálvez
From: Iago Toral Quiroga 

This code in brw_set_dest adjusts the execution size of any instruction
with a dst.width < 8. However, we don't want to do this with instructions
operating on doubles, since these will have a width of 4, but still
need an execution size of 8 (for SIMD8). Unfortunately, we can't just check
the size of the operands involved to detect if we are doing an operation on
doubles, because we can have instructions that do operations on double
operands interpreted as UD, operating on any of its 2 32-bit components.

Previous commits have made it so we never emit instructions with a horizontal
width of 4 that don't have the correct execution size set for gen6+, so
we can skip it in this case, avoiding the conflicts with fp64 requirements.

Expanding the same fix to other hardware generations requires many more
changes but since we are not targetting fp64 support on them
wer don't really care for now.
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 88e2f71..f03776d 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -202,8 +202,20 @@ brw_set_dest(struct brw_codegen *p, brw_inst *inst, struct 
brw_reg dest)
/* Generators should set a default exec_size of either 8 (SIMD4x2 or SIMD8)
 * or 16 (SIMD16), as that's normally correct.  However, when dealing with
 * small registers, we automatically reduce it to match the register size.
+*
+* In platforms that support fp64 we can emit instructions with a width of
+* 4 that need two SIMD8 registers and an exec_size of 8 or 16. In these
+* cases we need to make sure that these instructions have their exec sizes
+* set properly when they are emitted and we can't rely on this code to fix
+* it.
 */
-   if (dest.width < BRW_EXECUTE_8)
+   bool fix_exec_size;
+   if (devinfo->gen >= 6)
+  fix_exec_size = dest.width < BRW_EXECUTE_4;
+   else
+  fix_exec_size = dest.width < BRW_EXECUTE_8;
+
+   if (fix_exec_size)
   brw_inst_set_exec_size(devinfo, inst, dest.width);
 }
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/9] Skip automatic execsize for instructions with a width of 4

2015-12-17 Thread Samuel Iglesias Gonsálvez
Hello,

This patch series is a updated version of the one Iago sent last
week [0] that includes patches for gen6 too, as suggested by Jason.

We checked the gen9 code paths that work with a horizontal width of 4
and we think there won't be any regression on gen9... but we don't
have any gen9 machine to run piglit with these patches. Can someone
check it?

Please read the original cover letter [0] for more information.

Sam

[0] http://lists.freedesktop.org/archives/mesa-dev/2015-December/102746.html

Iago Toral Quiroga (5):
  i965/eu: set correct execution size in brw_NOP
  i965/fs: set execution size for SEND messages in
generate_uniform_pull_constant_load_gen7
  i965/eu: set execution size for SEND message in
brw_send_indirect_message
  i965: set correct execsize for MOVS with a width of 4 in
brw_find_live_channel
  i965: Skip execution size adjustment for instructions of width 4

Samuel Iglesias Gonsálvez (4):
  i965/gs/gen6: fix execsize for instructions with width of 4 in
gen6_sol_program()
  i965/vec4/gen6: fix exec_size for instructions with width of 4 in
generate_gs_svb_write()
  i965/vec4/gen6: fix exec_size for instructions with destination width
of 4
  i965/vec4/gen6: fix exec_size for MOV with a width of 4 in
generate_gs_ff_sync()

 src/mesa/drivers/dri/i965/brw_eu_emit.c  | 25 +---
 src/mesa/drivers/dri/i965/brw_ff_gs_emit.c   |  9 -
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 ++
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 13 +++-
 4 files changed, 44 insertions(+), 5 deletions(-)

-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/9] i965/vec4/gen6: fix exec_size for MOV with a width of 4 in generate_gs_ff_sync()

2015-12-17 Thread Samuel Iglesias Gonsálvez
From: Samuel Iglesias Gonsalvez 

Signed-off-by: Samuel Iglesias Gonsalvez 
---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index e2a203a..3a14685 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -698,8 +698,10 @@ generate_gs_ff_sync(struct brw_codegen *p,
brw_MOV(p, get_element_ud(header, 0), get_element_ud(dst, 0));
 
/* src1 is not an immediate when we use transform feedback */
-   if (src1.file != BRW_IMMEDIATE_VALUE)
+   if (src1.file != BRW_IMMEDIATE_VALUE) {
+  brw_set_default_exec_size(p, BRW_EXECUTE_4);
   brw_MOV(p, brw_vec4_grf(src1.nr, 0), brw_vec4_grf(dst.nr, 1));
+   }
 
brw_pop_insn_state(p);
 }
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/9] i965/vec4/gen6: fix exec_size for instructions with width of 4 in generate_gs_svb_write()

2015-12-17 Thread Samuel Iglesias Gonsálvez
From: Samuel Iglesias Gonsalvez 

Signed-off-by: Samuel Iglesias Gonsalvez 
---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index c3426dd..e3f9eea 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -478,10 +478,13 @@ generate_gs_svb_write(struct brw_codegen *p,
bool final_write = inst->sol_final_write;
 
brw_push_insn_state(p);
+   brw_set_default_exec_size(p, BRW_EXECUTE_4);
/* Copy Vertex data into M0.x */
brw_MOV(p, stride(dst, 4, 4, 1),
stride(retype(src0, BRW_REGISTER_TYPE_UD), 4, 4, 1));
+   brw_pop_insn_state(p);
 
+   brw_push_insn_state(p);
/* Send SVB Write */
brw_svb_write(p,
  final_write ? src1 : brw_null_reg(), /* dest == src1 */
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] i965/fs: set execution size for SEND messages in generate_uniform_pull_constant_load_gen7

2015-12-17 Thread Samuel Iglesias Gonsálvez
From: Iago Toral Quiroga 

---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index c25da07..42b5e86 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1248,6 +1248,7 @@ 
fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst,
   brw_set_default_compression_control(p, BRW_COMPRESSION_NONE);
   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
   brw_inst *send = brw_next_insn(p, BRW_OPCODE_SEND);
+  brw_inst_set_exec_size(devinfo, send, dst.width);
   brw_pop_insn_state(p);
 
   brw_set_dest(p, send, dst);
@@ -1279,6 +1280,7 @@ 
fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst,
   /* dst = send(payload, a0.0 | ) */
   brw_inst *insn = brw_send_indirect_message(
  p, BRW_SFID_SAMPLER, dst, src, addr);
+  brw_inst_set_exec_size(devinfo, insn, dst.width);
   brw_set_sampler_message(p, insn,
   0,
   0, /* LD message ignores sampler unit */
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/2] mesa: Add a _mesa_active_fragment_shader_has_side_effects helper

2015-12-17 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> Some drivers can disable the FS unit if there is nothing in the shader code
> that writes to an output (i.e. color, depth, etc). Right now, mesa has
> a function to check for atomic buffers and the i965 driver also checks for
> images. Refactor this logic into a generic function that we can use for
> any source of side effects in a fragment shader. Sugested by Jason.
> ---
>  src/mesa/drivers/dri/i965/gen7_wm_state.c |  6 +-
>  src/mesa/drivers/dri/i965/gen8_ps_state.c |  3 +--
>  src/mesa/main/mtypes.h| 15 ---
>  3 files changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c 
> b/src/mesa/drivers/dri/i965/gen7_wm_state.c
> index 06d5e65..a6d1028 100644
> --- a/src/mesa/drivers/dri/i965/gen7_wm_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c
> @@ -77,13 +77,9 @@ upload_wm_state(struct brw_context *brw)
>dw1 |= GEN7_WM_KILL_ENABLE;
> }
>  
> -   if (_mesa_active_fragment_shader_has_atomic_ops(>ctx)) {
> -  dw1 |= GEN7_WM_DISPATCH_ENABLE;
> -   }
> -
> /* _NEW_BUFFERS | _NEW_COLOR */
> if (brw_color_buffer_write_enabled(brw) || writes_depth ||
> -   prog_data->base.nr_image_params ||
> +   _mesa_active_fragment_shader_has_side_effects(>ctx) ||
> dw1 & GEN7_WM_KILL_ENABLE) {
>dw1 |= GEN7_WM_DISPATCH_ENABLE;
> }

Hey, it looks like SSBOs are still missing a couple of things that could
make their side effects rather non-deterministic on i965 hardware: On
HSW you should probably set the UAV_ONLY WM state bit when there are no
colour or depth buffer writes as is done for images below in this same
function, and on all hardware you should set the early depth/stencil
control field to PSEXEC unless early fragment tests are enabled to make
sure that the fragment shader is executed regardless of whether
per-fragment tests pass or not as the spec requires.

> diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c 
> b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> index 945f710..3cc8c68 100644
> --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> @@ -90,8 +90,7 @@ gen8_upload_ps_extra(struct brw_context *brw,
>  *
>  * BRW_NEW_FS_PROG_DATA | BRW_NEW_FRAGMENT_PROGRAM | _NEW_BUFFERS | 
> _NEW_COLOR
>  */
> -   if ((_mesa_active_fragment_shader_has_atomic_ops(>ctx) ||
> -prog_data->base.nr_image_params) &&
> +   if (_mesa_active_fragment_shader_has_side_effects(>ctx) &&
> !brw_color_buffer_write_enabled(brw))
>dw1 |= GEN8_PSX_SHADER_HAS_UAV;
>  
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index 191a9ea..834ba59 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -4538,11 +4538,20 @@ enum _debug
> DEBUG_INCOMPLETE_FBO = (1 << 3)
>  };
>  
> +/**
> + * Checks if the active fragment shader program can have side effects due
> + * to use of things like atomic buffers or images
> + */
>  static inline bool
> -_mesa_active_fragment_shader_has_atomic_ops(const struct gl_context *ctx)
> +_mesa_active_fragment_shader_has_side_effects(const struct gl_context *ctx)
>  {
> -   return ctx->Shader._CurrentFragmentProgram != NULL &&
> -  
> ctx->Shader._CurrentFragmentProgram->_LinkedShaders[MESA_SHADER_FRAGMENT]->NumAtomicBuffers
>  > 0;
> +   const struct gl_shader *sh;
> +
> +   if (!ctx->Shader._CurrentFragmentProgram)
> +  return false;
> +
> +   sh = 
> ctx->Shader._CurrentFragmentProgram->_LinkedShaders[MESA_SHADER_FRAGMENT];
> +   return sh->NumAtomicBuffers > 0 || sh->NumImages > 0;
>  }
>  
>  #ifdef __cplusplus
> -- 
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/9] i965/eu: set correct execution size in brw_NOP

2015-12-17 Thread Matt Turner
Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4

2015-12-17 Thread Matt Turner
On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quiroga  wrote:
> Right now we rely on the code at the bottom of brw_set_dest to set the 
> correct execution size for anything that does not operate on a full SIMD 
> register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, 
> where operands are twice as big and we see instructions with a horizontal 
> width of 4 that still require an execution size of 8. We cannot fix this by 
> simply checking the type of the operands involved and skip the automatic 
> execsize adjustment when they are doubles because we can also operate on 
> doubles as integers (for pack and unpack operations for example).

Can you give an example of when checking the type wouldn't be
sufficient? Presumably packDouble2x32/unpackDouble2x32? What code do
they generate? Could we look at the destination's stride as well?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] draw: fix pstipple and aaline stages wrt sampler_views/samplers

2015-12-17 Thread Jose Fonseca

On 17/12/15 04:59, srol...@vmware.com wrote:

From: Roland Scheidegger 

Those stages only really work for OGL-style texturing (so number of samplers
and views mostly the same, certainly for the max values).
These get often set up all at once, thus there might be max number of both
even if all of them are just NULL. We must not set the max number of samplers
and views to the same value since that will lead to terrible things if a driver
supports more views than samplers (and the state tracker set up all the views).
(This will not make these stages magically work if a shader uses dx10-style
texturing, they might still replace an actually used sview in that case.)
---
  src/gallium/auxiliary/draw/draw_pipe_aaline.c   | 9 +
  src/gallium/auxiliary/draw/draw_pipe_pstipple.c | 7 ---
  2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_aaline.c 
b/src/gallium/auxiliary/draw/draw_pipe_aaline.c
index 85d24b7..85ae84c 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_aaline.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_aaline.c
@@ -646,6 +646,7 @@ aaline_first_line(struct draw_stage *stage, struct 
prim_header *header)
 struct pipe_context *pipe = draw->pipe;
 const struct pipe_rasterizer_state *rast = draw->rasterizer;
 uint num_samplers;
+   uint num_sampler_views;
 void *r;

 assert(draw->rasterizer->line_smooth);
@@ -667,9 +668,9 @@ aaline_first_line(struct draw_stage *stage, struct 
prim_header *header)
 draw_aaline_prepare_outputs(draw, draw->pipeline.aaline);

 /* how many samplers? */
-   /* we'll use sampler/texture[pstip->sampler_unit] for the stipple */
-   num_samplers = MAX2(aaline->num_sampler_views, aaline->num_samplers);
-   num_samplers = MAX2(num_samplers, aaline->fs->sampler_unit + 1);
+   /* we'll use sampler/texture[aaline->sampler_unit] for the alpha texture */
+   num_samplers = MAX2(aaline->num_samplers, aaline->fs->sampler_unit + 1);
+   num_sampler_views = MAX2(num_samplers, aaline->num_sampler_views);

 aaline->state.sampler[aaline->fs->sampler_unit] = aaline->sampler_cso;
 
pipe_sampler_view_reference(>state.sampler_views[aaline->fs->sampler_unit],
@@ -681,7 +682,7 @@ aaline_first_line(struct draw_stage *stage, struct 
prim_header *header)
num_samplers, aaline->state.sampler);

 aaline->driver_set_sampler_views(pipe, PIPE_SHADER_FRAGMENT, 0,
-num_samplers, aaline->state.sampler_views);
+num_sampler_views, 
aaline->state.sampler_views);

 /* Disable triangle culling, stippling, unfilled mode etc. */
 r = draw_get_rasterizer_no_cull(draw, rast->scissor, rast->flatshade);
diff --git a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c 
b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
index a51e91f..3bfd414 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
@@ -477,6 +477,7 @@ pstip_first_tri(struct draw_stage *stage, struct 
prim_header *header)
 struct pipe_context *pipe = pstip->pipe;
 struct draw_context *draw = stage->draw;
 uint num_samplers;
+   uint num_sampler_views;

 assert(stage->draw->rasterizer->poly_stipple_enable);

@@ -490,8 +491,8 @@ pstip_first_tri(struct draw_stage *stage, struct 
prim_header *header)

 /* how many samplers? */
 /* we'll use sampler/texture[pstip->sampler_unit] for the stipple */
-   num_samplers = MAX2(pstip->num_sampler_views, pstip->num_samplers);
-   num_samplers = MAX2(num_samplers, pstip->fs->sampler_unit + 1);
+   num_samplers = MAX2(pstip->num_samplers, pstip->fs->sampler_unit + 1);
+   num_sampler_views = MAX2(pstip->num_sampler_views, num_samplers);

 /* plug in our sampler, texture */
 pstip->state.samplers[pstip->fs->sampler_unit] = pstip->sampler_cso;
@@ -506,7 +507,7 @@ pstip_first_tri(struct draw_stage *stage, struct 
prim_header *header)
   num_samplers, pstip->state.samplers);

 pstip->driver_set_sampler_views(pipe, PIPE_SHADER_FRAGMENT, 0,
-   num_samplers, pstip->state.sampler_views);
+   num_sampler_views, 
pstip->state.sampler_views);

 draw->suspend_flushing = FALSE;




Reviewed-by: Jose Fonseca 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 80183] [llvmpipe] triangles with vertices that map to raster positions > viewport width/height are not displayed

2015-12-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=80183

Roland Scheidegger  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #20 from Roland Scheidegger  ---
Ok let's close it then.
FWIW I suspect 9e3f2af3c3732bd618308ddeffb017966a4fc93e fixed it (this applies
if you were writing gl_ClipVertex and have enabled at least one clip plane).

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] Add .mailmap

2015-12-17 Thread Matt Turner
On Thu, Dec 17, 2015 at 1:09 AM, Giuseppe Bilotta
 wrote:
> This adds a first tentative .mailmap file, to canonicize contributor
> name/emails in shortlogs and other statistical endeavours.
>
> Signed-off-by: Giuseppe Bilotta 
> ---

If we want this kind of information, should we just go with Ken's
gitdm branch? That'll provide information about per-company
contribution stats as well.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] draw: fix pstipple and aaline stages wrt sampler_views/samplers

2015-12-17 Thread Brian Paul

On 12/16/2015 09:59 PM, srol...@vmware.com wrote:

From: Roland Scheidegger 

Those stages only really work for OGL-style texturing (so number of samplers
and views mostly the same, certainly for the max values).
These get often set up all at once, thus there might be max number of both
even if all of them are just NULL. We must not set the max number of samplers
and views to the same value since that will lead to terrible things if a driver
supports more views than samplers (and the state tracker set up all the views).
(This will not make these stages magically work if a shader uses dx10-style
texturing, they might still replace an actually used sview in that case.)
---
  src/gallium/auxiliary/draw/draw_pipe_aaline.c   | 9 +
  src/gallium/auxiliary/draw/draw_pipe_pstipple.c | 7 ---
  2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_aaline.c 
b/src/gallium/auxiliary/draw/draw_pipe_aaline.c
index 85d24b7..85ae84c 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_aaline.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_aaline.c
@@ -646,6 +646,7 @@ aaline_first_line(struct draw_stage *stage, struct 
prim_header *header)
 struct pipe_context *pipe = draw->pipe;
 const struct pipe_rasterizer_state *rast = draw->rasterizer;
 uint num_samplers;
+   uint num_sampler_views;
 void *r;

 assert(draw->rasterizer->line_smooth);
@@ -667,9 +668,9 @@ aaline_first_line(struct draw_stage *stage, struct 
prim_header *header)
 draw_aaline_prepare_outputs(draw, draw->pipeline.aaline);

 /* how many samplers? */
-   /* we'll use sampler/texture[pstip->sampler_unit] for the stipple */
-   num_samplers = MAX2(aaline->num_sampler_views, aaline->num_samplers);
-   num_samplers = MAX2(num_samplers, aaline->fs->sampler_unit + 1);
+   /* we'll use sampler/texture[aaline->sampler_unit] for the alpha texture */
+   num_samplers = MAX2(aaline->num_samplers, aaline->fs->sampler_unit + 1);
+   num_sampler_views = MAX2(num_samplers, aaline->num_sampler_views);

 aaline->state.sampler[aaline->fs->sampler_unit] = aaline->sampler_cso;
 
pipe_sampler_view_reference(>state.sampler_views[aaline->fs->sampler_unit],
@@ -681,7 +682,7 @@ aaline_first_line(struct draw_stage *stage, struct 
prim_header *header)
num_samplers, aaline->state.sampler);

 aaline->driver_set_sampler_views(pipe, PIPE_SHADER_FRAGMENT, 0,
-num_samplers, aaline->state.sampler_views);
+num_sampler_views, 
aaline->state.sampler_views);

 /* Disable triangle culling, stippling, unfilled mode etc. */
 r = draw_get_rasterizer_no_cull(draw, rast->scissor, rast->flatshade);
diff --git a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c 
b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
index a51e91f..3bfd414 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
@@ -477,6 +477,7 @@ pstip_first_tri(struct draw_stage *stage, struct 
prim_header *header)
 struct pipe_context *pipe = pstip->pipe;
 struct draw_context *draw = stage->draw;
 uint num_samplers;
+   uint num_sampler_views;

 assert(stage->draw->rasterizer->poly_stipple_enable);

@@ -490,8 +491,8 @@ pstip_first_tri(struct draw_stage *stage, struct 
prim_header *header)

 /* how many samplers? */
 /* we'll use sampler/texture[pstip->sampler_unit] for the stipple */
-   num_samplers = MAX2(pstip->num_sampler_views, pstip->num_samplers);
-   num_samplers = MAX2(num_samplers, pstip->fs->sampler_unit + 1);
+   num_samplers = MAX2(pstip->num_samplers, pstip->fs->sampler_unit + 1);
+   num_sampler_views = MAX2(pstip->num_sampler_views, num_samplers);

 /* plug in our sampler, texture */
 pstip->state.samplers[pstip->fs->sampler_unit] = pstip->sampler_cso;
@@ -506,7 +507,7 @@ pstip_first_tri(struct draw_stage *stage, struct 
prim_header *header)
   num_samplers, pstip->state.samplers);

 pstip->driver_set_sampler_views(pipe, PIPE_SHADER_FRAGMENT, 0,
-   num_samplers, pstip->state.sampler_views);
+   num_sampler_views, 
pstip->state.sampler_views);

 draw->suspend_flushing = FALSE;




Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4

2015-12-17 Thread Ilia Mirkin
On Thu, Dec 17, 2015 at 12:31 PM, Connor Abbott  wrote:
> On Thu, Dec 17, 2015 at 11:44 AM, Matt Turner  wrote:
>> On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quiroga  wrote:
>>> Right now we rely on the code at the bottom of brw_set_dest to set the 
>>> correct execution size for anything that does not operate on a full SIMD 
>>> register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, 
>>> where operands are twice as big and we see instructions with a horizontal 
>>> width of 4 that still require an execution size of 8. We cannot fix this by 
>>> simply checking the type of the operands involved and skip the automatic 
>>> execsize adjustment when they are doubles because we can also operate on 
>>> doubles as integers (for pack and unpack operations for example).
>>
>> Can you give an example of when checking the type wouldn't be
>> sufficient? Presumably packDouble2x32/unpackDouble2x32? What code do
>> they generate? Could we look at the destination's stride as well?
>
> Yes, packDouble2x32 is what would cause the same issue, since it turns
> into two 32-bit MOV's with a destination stride of 2 (and offset of 0
> and 1, ofc). We could check for the stride, but if we really don't
> want to change this code, the better way to go would be to avoid the
> width adjustment that causes this whole thing to happen for
> destination registers, since everything other than this piece of code
> ignores the destination width and vstride. But that being said, this
> code is a hack since it breaks the assumption that fs_inst::exec_size
> always equals the final ExecSize of the instruction, and Iago and I
> agreed to fix it (or at least for now, do the portion of the fix
> needed for fp64) rather than work around it.

Not sure if you guys are thinking about it, but a lot of these things
that apply to doubles also apply to ARB_gpu_shader_int64. While you
obviously don't need to support it now, you may want to keep it in
mind as you make design decisions.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/15] i965/fs: Get rid of reladdr

2015-12-17 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Dec 14, 2015 6:24 AM, "Francisco Jerez"  wrote:
>>
>> Jason Ekstrand  writes:
>>
>> > On Dec 11, 2015 5:44 AM, "Francisco Jerez" 
> wrote:
>> >>
>> >> Jason Ekstrand  writes:
>> >>
>> >> > On Dec 10, 2015 6:58 AM, "Francisco Jerez" 
>> > wrote:
>> >> >>
>> >> >> Jason Ekstrand  writes:
>> >> >>
>> >> >> > We aren't using it anymore.
>> >> >>
>> >> >> It seems useful to me to be able to represent indirect access as
> part
>> > of
>> >> >> any instruction source or destination register.
>> >> >>
>> >> >> The following:
>> >> >>
>> >> >> | mov_indirect g0, g1, a0
>> >> >> | foo g2, g0
>> >> >>
>> >> >> and the converse case with indirect destination offset (which you
> don't
>> >> >> seem to represent currently) can be implemented by the hardware more
>> >> >> efficiently using a single instruction in certain cases.  The
> current
>> > IR
>> >> >> is able to represent what the hardware can do, but supporting the
>> >> >> MOV_INDIRECT instruction only would force us to keep the indirection
>> >> >> separate from the instruction that uses it, so it seems like a less
>> >> >> expressive representation to me than the current approach, unless
>> > you're
>> >> >> willing to add _INDIRECT variants of most hardware opcodes.
>> >> >
>> >> > Yes and, mostly, no.  Yes, you can put an indirect on almost anything
>> > but
>> >> > it has substantial restrictions:
>> >> >
>> >> Yes, I'm aware of the restrictions of indirect addressing on Gen
>> >> hardware.  The fact that reladdr can represent addressing modes which
>> >> aren't directly implemented in the hardware doesn't invalidate the
>> >> abstraction, nor implies that optimization and translation passes must
>> >> be made aware of such restrictions.  It just means that our abstraction
>> >> is a superset of the hardware indirect addressing scheme, which is fine
>> >> given a lowering pass to convert indirect addressing into a legal form.
>> >>
>> >> Your proposal instead goes the opposite direction and replaces the
>> >> preexisting abstraction with a new abstraction which can only represent
>> >> a tiny subset of the functionality of hardware instructions, which I
>> >> don't think is acceptable for a low-level IR.
>> >>
>> >> > 1) Destination indirects must be uniform (I'm only 95% sure this is
> the
>> >> > case)
>> >> >
>> >> > 2) We only have 8 address subregisters on HSW and prior and 16 on BDW
>> > which
>> >> > means:
>> >> >
>> >>
>> >> That's the kind of thing that SIMD lowering would be able to take care
>> >> of easily.
>> >
>> > Yes, and we use it for MOV_INDIRECT.
>> >
>> Right, and there's no reason why it couldn't be used for reladdr in the
>> same way to handle the above restrictions.
>>
>> >> > a) All indirects must be uniform OR
>> >> >
>> >> > b) We must be on Broadwell, have only two indirects, and split
> the
>> >> > instruction OR
>> >> >
>> >> > c) We can only have one indirect (and still split the
> instruction on
>> >> > HSW)
>> >> >
>> >> > So, yes, "everthing can be uniform", but not in real life.
>> >>
>> >> > The reladdr abstraction, while maybe ok for a high-level IR, doesn't
>> >> > accurately represent the hardware at all because it can't represent
>> >> > any of the restrictions without piles of helper functions and checks.
>> >>
>> >> The reladdr abstraction can represent the full flexibility of the
>> >> hardware, while MOV_INDIRECT cannot, that makes reladdr a more accurate
>> >> low-level representation of the program.  For that reason I'd argue the
>> >> exact opposite: MOV_INDIRECT would be a great abstraction and a welcome
>> >> simplification for a high-level IR (e.g. NIR), in which nothing is lost
>> >> by sacrificing the expressiveness of individual instructions in favour
>> >> of instruction composition, because the back-end can always combine
>> >> multiple high-level instructions into a single low-level instruction
>> >> where the hardware ISA allows, as long as the low-level IR is able to
>> >> represent the full functionality of hardware instructions (IOW the
>> >> mapping between low-level IR instructions and hardware instructions is
>> >> surjective), which is further from being the case after this series.
>> >
>> > Flexibility and expressiveness isn't always a good thing.
>>
>> Expressiveness is a good thing as long as it's required to represent the
>> capabilities of the hardware.
>
> As long as it's required to represent those capabilities that we care
> about, yes.  However, the hardware can do many things that either aren't
> useful for us or take so much effort for the little benefit you gain, that
> it's not worth it.  I'm arguing that saving a single MOV in the few places
> that we do in directs isn't worth the complexity.
>
>> > Yes, reladdr is more "expressive", but it's not getting used because
>> > that 

Re: [Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4

2015-12-17 Thread Connor Abbott
On Thu, Dec 17, 2015 at 11:44 AM, Matt Turner  wrote:
> On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quiroga  wrote:
>> Right now we rely on the code at the bottom of brw_set_dest to set the 
>> correct execution size for anything that does not operate on a full SIMD 
>> register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, 
>> where operands are twice as big and we see instructions with a horizontal 
>> width of 4 that still require an execution size of 8. We cannot fix this by 
>> simply checking the type of the operands involved and skip the automatic 
>> execsize adjustment when they are doubles because we can also operate on 
>> doubles as integers (for pack and unpack operations for example).
>
> Can you give an example of when checking the type wouldn't be
> sufficient? Presumably packDouble2x32/unpackDouble2x32? What code do
> they generate? Could we look at the destination's stride as well?

Yes, packDouble2x32 is what would cause the same issue, since it turns
into two 32-bit MOV's with a destination stride of 2 (and offset of 0
and 1, ofc). We could check for the stride, but if we really don't
want to change this code, the better way to go would be to avoid the
width adjustment that causes this whole thing to happen for
destination registers, since everything other than this piece of code
ignores the destination width and vstride. But that being said, this
code is a hack since it breaks the assumption that fs_inst::exec_size
always equals the final ExecSize of the instruction, and Iago and I
agreed to fix it (or at least for now, do the portion of the fix
needed for fp64) rather than work around it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] draw: fix clip test with NaNs

2015-12-17 Thread Brian Paul

On 12/17/2015 09:58 AM, srol...@vmware.com wrote:

From: Roland Scheidegger 

NaNs mean it should be clipped, otherwise the NaNs might get passed to the
next stages (if clipping didn't happen for another reason already), which
might cause all kind of problems.
The llvm path got this right already (possibly by luck), but this isn't used
when there's a gs active.
Found by code inspection, verified with some hacked piglit test and some more
hacked debug output.
(Note the clipper can still itself incorrectly generate NaN and INF position
values in its output prims (at least after w divide / viewport transform) even
if the inputs weren't NaNs, if the position data of the vertices is
"sufficiently bad".)
---
  src/gallium/auxiliary/draw/draw_cliptest_tmp.h | 28 +-
  src/gallium/auxiliary/draw/draw_llvm.c |  4 
  2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h 
b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
index 779b237..d8866cd 100644
--- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
+++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
@@ -95,30 +95,31 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs,
  out->pre_clip_pos[i] = position[i];
   }

+ /* Be careful with NaNs. Comparisons must be true for them. */
   /* Do the hardwired planes first:
*/
   if (flags & DO_CLIP_XY_GUARD_BAND) {
-if (-0.50 * position[0] + position[3] < 0) mask |= (1<<0);
-if ( 0.50 * position[0] + position[3] < 0) mask |= (1<<1);
-if (-0.50 * position[1] + position[3] < 0) mask |= (1<<2);
-if ( 0.50 * position[1] + position[3] < 0) mask |= (1<<3);
+if (!(-0.50 * position[0] + position[3] >= 0)) mask |= (1<<0);
+if (!( 0.50 * position[0] + position[3] >= 0)) mask |= (1<<1);
+if (!(-0.50 * position[1] + position[3] >= 0)) mask |= (1<<2);
+if (!( 0.50 * position[1] + position[3] >= 0)) mask |= (1<<3);
   }
   else if (flags & DO_CLIP_XY) {
-if (-position[0] + position[3] < 0) mask |= (1<<0);
-if ( position[0] + position[3] < 0) mask |= (1<<1);
-if (-position[1] + position[3] < 0) mask |= (1<<2);
-if ( position[1] + position[3] < 0) mask |= (1<<3);
+if (!(-position[0] + position[3] >= 0)) mask |= (1<<0);
+if (!( position[0] + position[3] >= 0)) mask |= (1<<1);
+if (!(-position[1] + position[3] >= 0)) mask |= (1<<2);
+if (!( position[1] + position[3] >= 0)) mask |= (1<<3);
   }

   /* Clip Z planes according to full cube, half cube or none.
*/
   if (flags & DO_CLIP_FULL_Z) {
-if ( position[2] + position[3] < 0) mask |= (1<<4);
-if (-position[2] + position[3] < 0) mask |= (1<<5);
+if (!( position[2] + position[3] >= 0)) mask |= (1<<4);
+if (!(-position[2] + position[3] >= 0)) mask |= (1<<5);
   }
   else if (flags & DO_CLIP_HALF_Z) {
-if ( position[2]   < 0) mask |= (1<<4);
-if (-position[2] + position[3] < 0) mask |= (1<<5);
+if (!( position[2]   >= 0)) mask |= (1<<4);
+if (!(-position[2] + position[3] >= 0)) mask |= (1<<5);
   }

   if (flags & DO_CLIP_USER) {
@@ -146,7 +147,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs,
if (clipdist < 0 || util_is_inf_or_nan(clipdist))
   mask |= 1 << plane_idx;
 } else {
-  if (dot4(clipvertex, plane[plane_idx]) < 0)
+  if (!(dot4(clipvertex, plane[plane_idx]) >= 0))
   mask |= 1 << plane_idx;
 }
  }
@@ -192,7 +193,6 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs,

out = (struct vertex_header *)( (char *)out + info->stride );
 }
-
 return need_pipeline != 0;
  }

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 8435991..dad523a 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1200,6 +1200,10 @@ generate_clipmask(struct draw_llvm *llvm,
cv_w = pos_w;
 }

+   /*
+* Be careful with the comparisons and NaNs (using llvm's unordered
+* comparisons here).
+*/
 /* Cliptest, for hardwired planes */
 if (clip_xy) {
/* plane 1 */



Looks OK to me, but I'm not sure I understand what'll happen when we 
find a vertex with a NaN coordinate.  Does the whole triangle get 
culled?  Otherwise, what would be the result of computing the 
intersection point on the clip plane?


Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH] draw: fix clip test with NaNs

2015-12-17 Thread sroland
From: Roland Scheidegger 

NaNs mean it should be clipped, otherwise the NaNs might get passed to the
next stages (if clipping didn't happen for another reason already), which
might cause all kind of problems.
The llvm path got this right already (possibly by luck), but this isn't used
when there's a gs active.
Found by code inspection, verified with some hacked piglit test and some more
hacked debug output.
(Note the clipper can still itself incorrectly generate NaN and INF position
values in its output prims (at least after w divide / viewport transform) even
if the inputs weren't NaNs, if the position data of the vertices is
"sufficiently bad".)
---
 src/gallium/auxiliary/draw/draw_cliptest_tmp.h | 28 +-
 src/gallium/auxiliary/draw/draw_llvm.c |  4 
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h 
b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
index 779b237..d8866cd 100644
--- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
+++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
@@ -95,30 +95,31 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs,
 out->pre_clip_pos[i] = position[i];
  }
 
+ /* Be careful with NaNs. Comparisons must be true for them. */
  /* Do the hardwired planes first:
   */
  if (flags & DO_CLIP_XY_GUARD_BAND) {
-if (-0.50 * position[0] + position[3] < 0) mask |= (1<<0);
-if ( 0.50 * position[0] + position[3] < 0) mask |= (1<<1);
-if (-0.50 * position[1] + position[3] < 0) mask |= (1<<2);
-if ( 0.50 * position[1] + position[3] < 0) mask |= (1<<3);
+if (!(-0.50 * position[0] + position[3] >= 0)) mask |= (1<<0);
+if (!( 0.50 * position[0] + position[3] >= 0)) mask |= (1<<1);
+if (!(-0.50 * position[1] + position[3] >= 0)) mask |= (1<<2);
+if (!( 0.50 * position[1] + position[3] >= 0)) mask |= (1<<3);
  }
  else if (flags & DO_CLIP_XY) {
-if (-position[0] + position[3] < 0) mask |= (1<<0);
-if ( position[0] + position[3] < 0) mask |= (1<<1);
-if (-position[1] + position[3] < 0) mask |= (1<<2);
-if ( position[1] + position[3] < 0) mask |= (1<<3);
+if (!(-position[0] + position[3] >= 0)) mask |= (1<<0);
+if (!( position[0] + position[3] >= 0)) mask |= (1<<1);
+if (!(-position[1] + position[3] >= 0)) mask |= (1<<2);
+if (!( position[1] + position[3] >= 0)) mask |= (1<<3);
  }
 
  /* Clip Z planes according to full cube, half cube or none.
   */
  if (flags & DO_CLIP_FULL_Z) {
-if ( position[2] + position[3] < 0) mask |= (1<<4);
-if (-position[2] + position[3] < 0) mask |= (1<<5);
+if (!( position[2] + position[3] >= 0)) mask |= (1<<4);
+if (!(-position[2] + position[3] >= 0)) mask |= (1<<5);
  }
  else if (flags & DO_CLIP_HALF_Z) {
-if ( position[2]   < 0) mask |= (1<<4);
-if (-position[2] + position[3] < 0) mask |= (1<<5);
+if (!( position[2]   >= 0)) mask |= (1<<4);
+if (!(-position[2] + position[3] >= 0)) mask |= (1<<5);
  }
 
  if (flags & DO_CLIP_USER) {
@@ -146,7 +147,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs,
   if (clipdist < 0 || util_is_inf_or_nan(clipdist))
  mask |= 1 << plane_idx;
} else {
-  if (dot4(clipvertex, plane[plane_idx]) < 0)
+  if (!(dot4(clipvertex, plane[plane_idx]) >= 0))
  mask |= 1 << plane_idx;
}
 }
@@ -192,7 +193,6 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs,
 
   out = (struct vertex_header *)( (char *)out + info->stride );
}
-
return need_pipeline != 0;
 }
 
diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 8435991..dad523a 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1200,6 +1200,10 @@ generate_clipmask(struct draw_llvm *llvm,
   cv_w = pos_w;
}
 
+   /*
+* Be careful with the comparisons and NaNs (using llvm's unordered
+* comparisons here).
+*/
/* Cliptest, for hardwired planes */
if (clip_xy) {
   /* plane 1 */
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/15] i965/fs: Get rid of reladdr

2015-12-17 Thread Jason Ekstrand
On Dec 17, 2015 9:10 AM, "Francisco Jerez"  wrote:
>
> Jason Ekstrand  writes:
>
> > On Dec 14, 2015 6:24 AM, "Francisco Jerez" 
wrote:
> >>
> >> Jason Ekstrand  writes:
> >>
> >> > On Dec 11, 2015 5:44 AM, "Francisco Jerez" 
> > wrote:
> >> >>
> >> >> Jason Ekstrand  writes:
> >> >>
> >> >> > On Dec 10, 2015 6:58 AM, "Francisco Jerez" 
> >> > wrote:
> >> >> >>
> >> >> >> Jason Ekstrand  writes:
> >> >> >>
> >> >> >> > We aren't using it anymore.
> >> >> >>
> >> >> >> It seems useful to me to be able to represent indirect access as
> > part
> >> > of
> >> >> >> any instruction source or destination register.
> >> >> >>
> >> >> >> The following:
> >> >> >>
> >> >> >> | mov_indirect g0, g1, a0
> >> >> >> | foo g2, g0
> >> >> >>
> >> >> >> and the converse case with indirect destination offset (which you
> > don't
> >> >> >> seem to represent currently) can be implemented by the hardware
more
> >> >> >> efficiently using a single instruction in certain cases.  The
> > current
> >> > IR
> >> >> >> is able to represent what the hardware can do, but supporting the
> >> >> >> MOV_INDIRECT instruction only would force us to keep the
indirection
> >> >> >> separate from the instruction that uses it, so it seems like a
less
> >> >> >> expressive representation to me than the current approach, unless
> >> > you're
> >> >> >> willing to add _INDIRECT variants of most hardware opcodes.
> >> >> >
> >> >> > Yes and, mostly, no.  Yes, you can put an indirect on almost
anything
> >> > but
> >> >> > it has substantial restrictions:
> >> >> >
> >> >> Yes, I'm aware of the restrictions of indirect addressing on Gen
> >> >> hardware.  The fact that reladdr can represent addressing modes
which
> >> >> aren't directly implemented in the hardware doesn't invalidate the
> >> >> abstraction, nor implies that optimization and translation passes
must
> >> >> be made aware of such restrictions.  It just means that our
abstraction
> >> >> is a superset of the hardware indirect addressing scheme, which is
fine
> >> >> given a lowering pass to convert indirect addressing into a legal
form.
> >> >>
> >> >> Your proposal instead goes the opposite direction and replaces the
> >> >> preexisting abstraction with a new abstraction which can only
represent
> >> >> a tiny subset of the functionality of hardware instructions, which I
> >> >> don't think is acceptable for a low-level IR.
> >> >>
> >> >> > 1) Destination indirects must be uniform (I'm only 95% sure this
is
> > the
> >> >> > case)
> >> >> >
> >> >> > 2) We only have 8 address subregisters on HSW and prior and 16 on
BDW
> >> > which
> >> >> > means:
> >> >> >
> >> >>
> >> >> That's the kind of thing that SIMD lowering would be able to take
care
> >> >> of easily.
> >> >
> >> > Yes, and we use it for MOV_INDIRECT.
> >> >
> >> Right, and there's no reason why it couldn't be used for reladdr in the
> >> same way to handle the above restrictions.
> >>
> >> >> > a) All indirects must be uniform OR
> >> >> >
> >> >> > b) We must be on Broadwell, have only two indirects, and split
> > the
> >> >> > instruction OR
> >> >> >
> >> >> > c) We can only have one indirect (and still split the
> > instruction on
> >> >> > HSW)
> >> >> >
> >> >> > So, yes, "everthing can be uniform", but not in real life.
> >> >>
> >> >> > The reladdr abstraction, while maybe ok for a high-level IR,
doesn't
> >> >> > accurately represent the hardware at all because it can't
represent
> >> >> > any of the restrictions without piles of helper functions and
checks.
> >> >>
> >> >> The reladdr abstraction can represent the full flexibility of the
> >> >> hardware, while MOV_INDIRECT cannot, that makes reladdr a more
accurate
> >> >> low-level representation of the program.  For that reason I'd argue
the
> >> >> exact opposite: MOV_INDIRECT would be a great abstraction and a
welcome
> >> >> simplification for a high-level IR (e.g. NIR), in which nothing is
lost
> >> >> by sacrificing the expressiveness of individual instructions in
favour
> >> >> of instruction composition, because the back-end can always combine
> >> >> multiple high-level instructions into a single low-level instruction
> >> >> where the hardware ISA allows, as long as the low-level IR is able
to
> >> >> represent the full functionality of hardware instructions (IOW the
> >> >> mapping between low-level IR instructions and hardware instructions
is
> >> >> surjective), which is further from being the case after this series.
> >> >
> >> > Flexibility and expressiveness isn't always a good thing.
> >>
> >> Expressiveness is a good thing as long as it's required to represent
the
> >> capabilities of the hardware.
> >
> > As long as it's required to represent those capabilities that we care
> > about, yes.  However, the hardware can do many things that either aren't
> 

Re: [Mesa-dev] [PATCH] draw: fix clip test with NaNs

2015-12-17 Thread Roland Scheidegger
Am 17.12.2015 um 19:13 schrieb Brian Paul:
> On 12/17/2015 09:58 AM, srol...@vmware.com wrote:
>> From: Roland Scheidegger 
>>
>> NaNs mean it should be clipped, otherwise the NaNs might get passed to
>> the
>> next stages (if clipping didn't happen for another reason already), which
>> might cause all kind of problems.
>> The llvm path got this right already (possibly by luck), but this
>> isn't used
>> when there's a gs active.
>> Found by code inspection, verified with some hacked piglit test and
>> some more
>> hacked debug output.
>> (Note the clipper can still itself incorrectly generate NaN and INF
>> position
>> values in its output prims (at least after w divide / viewport
>> transform) even
>> if the inputs weren't NaNs, if the position data of the vertices is
>> "sufficiently bad".)
>> ---
>>   src/gallium/auxiliary/draw/draw_cliptest_tmp.h | 28
>> +-
>>   src/gallium/auxiliary/draw/draw_llvm.c |  4 
>>   2 files changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
>> b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
>> index 779b237..d8866cd 100644
>> --- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
>> +++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
>> @@ -95,30 +95,31 @@ static boolean TAG(do_cliptest)( struct pt_post_vs
>> *pvs,
>>   out->pre_clip_pos[i] = position[i];
>>}
>>
>> + /* Be careful with NaNs. Comparisons must be true for them. */
>>/* Do the hardwired planes first:
>> */
>>if (flags & DO_CLIP_XY_GUARD_BAND) {
>> -if (-0.50 * position[0] + position[3] < 0) mask |= (1<<0);
>> -if ( 0.50 * position[0] + position[3] < 0) mask |= (1<<1);
>> -if (-0.50 * position[1] + position[3] < 0) mask |= (1<<2);
>> -if ( 0.50 * position[1] + position[3] < 0) mask |= (1<<3);
>> +if (!(-0.50 * position[0] + position[3] >= 0)) mask |=
>> (1<<0);
>> +if (!( 0.50 * position[0] + position[3] >= 0)) mask |=
>> (1<<1);
>> +if (!(-0.50 * position[1] + position[3] >= 0)) mask |=
>> (1<<2);
>> +if (!( 0.50 * position[1] + position[3] >= 0)) mask |=
>> (1<<3);
>>}
>>else if (flags & DO_CLIP_XY) {
>> -if (-position[0] + position[3] < 0) mask |= (1<<0);
>> -if ( position[0] + position[3] < 0) mask |= (1<<1);
>> -if (-position[1] + position[3] < 0) mask |= (1<<2);
>> -if ( position[1] + position[3] < 0) mask |= (1<<3);
>> +if (!(-position[0] + position[3] >= 0)) mask |= (1<<0);
>> +if (!( position[0] + position[3] >= 0)) mask |= (1<<1);
>> +if (!(-position[1] + position[3] >= 0)) mask |= (1<<2);
>> +if (!( position[1] + position[3] >= 0)) mask |= (1<<3);
>>}
>>
>>/* Clip Z planes according to full cube, half cube or none.
>> */
>>if (flags & DO_CLIP_FULL_Z) {
>> -if ( position[2] + position[3] < 0) mask |= (1<<4);
>> -if (-position[2] + position[3] < 0) mask |= (1<<5);
>> +if (!( position[2] + position[3] >= 0)) mask |= (1<<4);
>> +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5);
>>}
>>else if (flags & DO_CLIP_HALF_Z) {
>> -if ( position[2]   < 0) mask |= (1<<4);
>> -if (-position[2] + position[3] < 0) mask |= (1<<5);
>> +if (!( position[2]   >= 0)) mask |= (1<<4);
>> +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5);
>>}
>>
>>if (flags & DO_CLIP_USER) {
>> @@ -146,7 +147,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs
>> *pvs,
>> if (clipdist < 0 || util_is_inf_or_nan(clipdist))
>>mask |= 1 << plane_idx;
>>  } else {
>> -  if (dot4(clipvertex, plane[plane_idx]) < 0)
>> +  if (!(dot4(clipvertex, plane[plane_idx]) >= 0))
>>mask |= 1 << plane_idx;
>>  }
>>   }
>> @@ -192,7 +193,6 @@ static boolean TAG(do_cliptest)( struct pt_post_vs
>> *pvs,
>>
>> out = (struct vertex_header *)( (char *)out + info->stride );
>>  }
>> -
>>  return need_pipeline != 0;
>>   }
>>
>> diff --git a/src/gallium/auxiliary/draw/draw_llvm.c
>> b/src/gallium/auxiliary/draw/draw_llvm.c
>> index 8435991..dad523a 100644
>> --- a/src/gallium/auxiliary/draw/draw_llvm.c
>> +++ b/src/gallium/auxiliary/draw/draw_llvm.c
>> @@ -1200,6 +1200,10 @@ generate_clipmask(struct draw_llvm *llvm,
>> cv_w = pos_w;
>>  }
>>
>> +   /*
>> +* Be careful with the comparisons and NaNs (using llvm's unordered
>> +* comparisons here).
>> +*/
>>  /* Cliptest, for hardwired planes */
>>  if (clip_xy) {
>> /* plane 1 */
>>
> 
> Looks OK to me, but 

Re: [Mesa-dev] [PATCH] util/macros: Simplify DIV_ROUND_UP() definition

2015-12-17 Thread Matt Turner
On Thu, Dec 17, 2015 at 11:04 AM, Nanley Chery  wrote:
> On Thu, Dec 17, 2015 at 12:05:46PM +0100, Glenn Kennard wrote:
>> On Wed, 16 Dec 2015 20:57:51 +0100, Nanley Chery  
>> wrote:
>>
>> >From: Nanley Chery 
>> >
>> >Commit 64880d073ab21ae1abad0c049ea2d6a1169a3cfa consolidated two
>> >DIV_ROUND_UP() definitions to one, but chose the more
>> >compute-intensive version in the process. Use the simpler version
>> >instead. Reduces .text size by 1360 bytes.
>> >
>> >Output of `size lib/i965_dri.so`:
>> >  textdata bss dec hex filename
>> >   7850440  219264   27240 8096944  7b8cb0 lib/i965_dri.so (before)
>> >   7849080  219264   27240 8095584  7b8760 lib/i965_dri.so (after)
>> >
>> >Cc: Axel Davy 
>> >Signed-off-by: Nanley Chery 
>> >---
>> > src/util/macros.h | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> >diff --git a/src/util/macros.h b/src/util/macros.h
>> >index 0c8958f..53a98a0 100644
>> >--- a/src/util/macros.h
>> >+++ b/src/util/macros.h
>> >@@ -211,6 +211,6 @@ do {   \
>> > #endif
>> >/** Compute ceiling of integer quotient of A divided by B. */
>> >-#define DIV_ROUND_UP( A, B )  ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 )
>> >+#define DIV_ROUND_UP(A, B)  (((A) + (B) - 1) / (B))
>> >#endif /* UTIL_MACROS_H */
>>
>> I'll point out that these are not equivalent, one can overflow and the other 
>> doesn't. You
>> probably want to check if the call sites have sufficient checks for that 
>> before
>> substituting one for the other.
>>
>
> Good point. As I mentioned in another email, I'll leave the current
> macro untouched.

I think the chances of us relying on overflow behavior are exceedingly
small, and nearly all uses of DIV_ROUND_UP are in the i965 driver. I
think it's sufficiently safe to go ahead with the patch (but I am
still interested to know about your compiler and CFLAGS).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] draw: fix clip test with NaNs

2015-12-17 Thread Matt Turner
On Thu, Dec 17, 2015 at 11:19 AM, Roland Scheidegger  wrote:
> Am 17.12.2015 um 19:13 schrieb Brian Paul:
>> On 12/17/2015 09:58 AM, srol...@vmware.com wrote:
>>> From: Roland Scheidegger 
>>>
>>> NaNs mean it should be clipped, otherwise the NaNs might get passed to
>>> the
>>> next stages (if clipping didn't happen for another reason already), which
>>> might cause all kind of problems.
>>> The llvm path got this right already (possibly by luck), but this
>>> isn't used
>>> when there's a gs active.
>>> Found by code inspection, verified with some hacked piglit test and
>>> some more
>>> hacked debug output.
>>> (Note the clipper can still itself incorrectly generate NaN and INF
>>> position
>>> values in its output prims (at least after w divide / viewport
>>> transform) even
>>> if the inputs weren't NaNs, if the position data of the vertices is
>>> "sufficiently bad".)
>>> ---
>>>   src/gallium/auxiliary/draw/draw_cliptest_tmp.h | 28
>>> +-
>>>   src/gallium/auxiliary/draw/draw_llvm.c |  4 
>>>   2 files changed, 18 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
>>> b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
>>> index 779b237..d8866cd 100644
>>> --- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
>>> +++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
>>> @@ -95,30 +95,31 @@ static boolean TAG(do_cliptest)( struct pt_post_vs
>>> *pvs,
>>>   out->pre_clip_pos[i] = position[i];
>>>}
>>>
>>> + /* Be careful with NaNs. Comparisons must be true for them. */
>>>/* Do the hardwired planes first:
>>> */
>>>if (flags & DO_CLIP_XY_GUARD_BAND) {
>>> -if (-0.50 * position[0] + position[3] < 0) mask |= (1<<0);
>>> -if ( 0.50 * position[0] + position[3] < 0) mask |= (1<<1);
>>> -if (-0.50 * position[1] + position[3] < 0) mask |= (1<<2);
>>> -if ( 0.50 * position[1] + position[3] < 0) mask |= (1<<3);
>>> +if (!(-0.50 * position[0] + position[3] >= 0)) mask |=
>>> (1<<0);
>>> +if (!( 0.50 * position[0] + position[3] >= 0)) mask |=
>>> (1<<1);
>>> +if (!(-0.50 * position[1] + position[3] >= 0)) mask |=
>>> (1<<2);
>>> +if (!( 0.50 * position[1] + position[3] >= 0)) mask |=
>>> (1<<3);
>>>}
>>>else if (flags & DO_CLIP_XY) {
>>> -if (-position[0] + position[3] < 0) mask |= (1<<0);
>>> -if ( position[0] + position[3] < 0) mask |= (1<<1);
>>> -if (-position[1] + position[3] < 0) mask |= (1<<2);
>>> -if ( position[1] + position[3] < 0) mask |= (1<<3);
>>> +if (!(-position[0] + position[3] >= 0)) mask |= (1<<0);
>>> +if (!( position[0] + position[3] >= 0)) mask |= (1<<1);
>>> +if (!(-position[1] + position[3] >= 0)) mask |= (1<<2);
>>> +if (!( position[1] + position[3] >= 0)) mask |= (1<<3);
>>>}
>>>
>>>/* Clip Z planes according to full cube, half cube or none.
>>> */
>>>if (flags & DO_CLIP_FULL_Z) {
>>> -if ( position[2] + position[3] < 0) mask |= (1<<4);
>>> -if (-position[2] + position[3] < 0) mask |= (1<<5);
>>> +if (!( position[2] + position[3] >= 0)) mask |= (1<<4);
>>> +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5);
>>>}
>>>else if (flags & DO_CLIP_HALF_Z) {
>>> -if ( position[2]   < 0) mask |= (1<<4);
>>> -if (-position[2] + position[3] < 0) mask |= (1<<5);
>>> +if (!( position[2]   >= 0)) mask |= (1<<4);
>>> +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5);
>>>}
>>>
>>>if (flags & DO_CLIP_USER) {
>>> @@ -146,7 +147,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs
>>> *pvs,
>>> if (clipdist < 0 || util_is_inf_or_nan(clipdist))
>>>mask |= 1 << plane_idx;
>>>  } else {
>>> -  if (dot4(clipvertex, plane[plane_idx]) < 0)
>>> +  if (!(dot4(clipvertex, plane[plane_idx]) >= 0))
>>>mask |= 1 << plane_idx;
>>>  }
>>>   }
>>> @@ -192,7 +193,6 @@ static boolean TAG(do_cliptest)( struct pt_post_vs
>>> *pvs,
>>>
>>> out = (struct vertex_header *)( (char *)out + info->stride );
>>>  }
>>> -
>>>  return need_pipeline != 0;
>>>   }
>>>
>>> diff --git a/src/gallium/auxiliary/draw/draw_llvm.c
>>> b/src/gallium/auxiliary/draw/draw_llvm.c
>>> index 8435991..dad523a 100644
>>> --- a/src/gallium/auxiliary/draw/draw_llvm.c
>>> +++ b/src/gallium/auxiliary/draw/draw_llvm.c
>>> @@ -1200,6 +1200,10 @@ generate_clipmask(struct draw_llvm *llvm,
>>> cv_w = pos_w;
>>>  }
>>>
>>> +   /*
>>> +* Be careful with the comparisons 

Re: [Mesa-dev] [PATCH] draw: fix clip test with NaNs

2015-12-17 Thread Roland Scheidegger
Am 17.12.2015 um 20:22 schrieb Matt Turner:
> On Thu, Dec 17, 2015 at 11:19 AM, Roland Scheidegger  
> wrote:
>> Am 17.12.2015 um 19:13 schrieb Brian Paul:
>>> On 12/17/2015 09:58 AM, srol...@vmware.com wrote:
 From: Roland Scheidegger 

 NaNs mean it should be clipped, otherwise the NaNs might get passed to
 the
 next stages (if clipping didn't happen for another reason already), which
 might cause all kind of problems.
 The llvm path got this right already (possibly by luck), but this
 isn't used
 when there's a gs active.
 Found by code inspection, verified with some hacked piglit test and
 some more
 hacked debug output.
 (Note the clipper can still itself incorrectly generate NaN and INF
 position
 values in its output prims (at least after w divide / viewport
 transform) even
 if the inputs weren't NaNs, if the position data of the vertices is
 "sufficiently bad".)
 ---
   src/gallium/auxiliary/draw/draw_cliptest_tmp.h | 28
 +-
   src/gallium/auxiliary/draw/draw_llvm.c |  4 
   2 files changed, 18 insertions(+), 14 deletions(-)

 diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
 b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
 index 779b237..d8866cd 100644
 --- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
 +++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
 @@ -95,30 +95,31 @@ static boolean TAG(do_cliptest)( struct pt_post_vs
 *pvs,
   out->pre_clip_pos[i] = position[i];
}

 + /* Be careful with NaNs. Comparisons must be true for them. */
/* Do the hardwired planes first:
 */
if (flags & DO_CLIP_XY_GUARD_BAND) {
 -if (-0.50 * position[0] + position[3] < 0) mask |= (1<<0);
 -if ( 0.50 * position[0] + position[3] < 0) mask |= (1<<1);
 -if (-0.50 * position[1] + position[3] < 0) mask |= (1<<2);
 -if ( 0.50 * position[1] + position[3] < 0) mask |= (1<<3);
 +if (!(-0.50 * position[0] + position[3] >= 0)) mask |=
 (1<<0);
 +if (!( 0.50 * position[0] + position[3] >= 0)) mask |=
 (1<<1);
 +if (!(-0.50 * position[1] + position[3] >= 0)) mask |=
 (1<<2);
 +if (!( 0.50 * position[1] + position[3] >= 0)) mask |=
 (1<<3);
}
else if (flags & DO_CLIP_XY) {
 -if (-position[0] + position[3] < 0) mask |= (1<<0);
 -if ( position[0] + position[3] < 0) mask |= (1<<1);
 -if (-position[1] + position[3] < 0) mask |= (1<<2);
 -if ( position[1] + position[3] < 0) mask |= (1<<3);
 +if (!(-position[0] + position[3] >= 0)) mask |= (1<<0);
 +if (!( position[0] + position[3] >= 0)) mask |= (1<<1);
 +if (!(-position[1] + position[3] >= 0)) mask |= (1<<2);
 +if (!( position[1] + position[3] >= 0)) mask |= (1<<3);
}

/* Clip Z planes according to full cube, half cube or none.
 */
if (flags & DO_CLIP_FULL_Z) {
 -if ( position[2] + position[3] < 0) mask |= (1<<4);
 -if (-position[2] + position[3] < 0) mask |= (1<<5);
 +if (!( position[2] + position[3] >= 0)) mask |= (1<<4);
 +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5);
}
else if (flags & DO_CLIP_HALF_Z) {
 -if ( position[2]   < 0) mask |= (1<<4);
 -if (-position[2] + position[3] < 0) mask |= (1<<5);
 +if (!( position[2]   >= 0)) mask |= (1<<4);
 +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5);
}

if (flags & DO_CLIP_USER) {
 @@ -146,7 +147,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs
 *pvs,
 if (clipdist < 0 || util_is_inf_or_nan(clipdist))
mask |= 1 << plane_idx;
  } else {
 -  if (dot4(clipvertex, plane[plane_idx]) < 0)
 +  if (!(dot4(clipvertex, plane[plane_idx]) >= 0))
mask |= 1 << plane_idx;
  }
   }
 @@ -192,7 +193,6 @@ static boolean TAG(do_cliptest)( struct pt_post_vs
 *pvs,

 out = (struct vertex_header *)( (char *)out + info->stride );
  }
 -
  return need_pipeline != 0;
   }

 diff --git a/src/gallium/auxiliary/draw/draw_llvm.c
 b/src/gallium/auxiliary/draw/draw_llvm.c
 index 8435991..dad523a 100644
 --- a/src/gallium/auxiliary/draw/draw_llvm.c
 +++ b/src/gallium/auxiliary/draw/draw_llvm.c
 @@ 

Re: [Mesa-dev] [PATCH] util/macros: Simplify DIV_ROUND_UP() definition

2015-12-17 Thread Nanley Chery
On Thu, Dec 17, 2015 at 12:05:46PM +0100, Glenn Kennard wrote:
> On Wed, 16 Dec 2015 20:57:51 +0100, Nanley Chery  
> wrote:
> 
> >From: Nanley Chery 
> >
> >Commit 64880d073ab21ae1abad0c049ea2d6a1169a3cfa consolidated two
> >DIV_ROUND_UP() definitions to one, but chose the more
> >compute-intensive version in the process. Use the simpler version
> >instead. Reduces .text size by 1360 bytes.
> >
> >Output of `size lib/i965_dri.so`:
> >  textdata bss dec hex filename
> >   7850440  219264   27240 8096944  7b8cb0 lib/i965_dri.so (before)
> >   7849080  219264   27240 8095584  7b8760 lib/i965_dri.so (after)
> >
> >Cc: Axel Davy 
> >Signed-off-by: Nanley Chery 
> >---
> > src/util/macros.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/src/util/macros.h b/src/util/macros.h
> >index 0c8958f..53a98a0 100644
> >--- a/src/util/macros.h
> >+++ b/src/util/macros.h
> >@@ -211,6 +211,6 @@ do {   \
> > #endif
> >/** Compute ceiling of integer quotient of A divided by B. */
> >-#define DIV_ROUND_UP( A, B )  ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 )
> >+#define DIV_ROUND_UP(A, B)  (((A) + (B) - 1) / (B))
> >#endif /* UTIL_MACROS_H */
> 
> I'll point out that these are not equivalent, one can overflow and the other 
> doesn't. You
> probably want to check if the call sites have sufficient checks for that 
> before
> substituting one for the other.
> 

Good point. As I mentioned in another email, I'll leave the current
macro untouched.

Thanks,
Nanley

> 
> /Glenn
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] virgl: enable building on Android

2015-12-17 Thread Rob Herring
This is just a copy-n-paste and rename of vc4 Android makefiles.

Signed-off-by: Rob Herring 
---
 Android.mk  |  4 ++--
 src/gallium/Android.mk  |  5 +
 src/gallium/drivers/virgl/Android.mk| 35 +
 src/gallium/targets/dri/Android.mk  |  4 
 src/gallium/winsys/virgl/drm/Android.mk | 34 
 5 files changed, 80 insertions(+), 2 deletions(-)
 create mode 100644 src/gallium/drivers/virgl/Android.mk
 create mode 100644 src/gallium/winsys/virgl/drm/Android.mk

diff --git a/Android.mk b/Android.mk
index ed160fb..1d76559 100644
--- a/Android.mk
+++ b/Android.mk
@@ -24,7 +24,7 @@
 # BOARD_GPU_DRIVERS should be defined.  The valid values are
 #
 #   classic drivers: i915 i965
-#   gallium drivers: swrast freedreno i915g ilo nouveau r300g r600g radeonsi 
vc4 vmwgfx
+#   gallium drivers: swrast freedreno i915g ilo nouveau r300g r600g radeonsi 
vc4 virgl vmwgfx
 #
 # The main target is libGLES_mesa.  For each classic driver enabled, a DRI
 # module will also be built.  DRI modules will be loaded by libGLES_mesa.
@@ -46,7 +46,7 @@ MESA_COMMON_MK := $(MESA_TOP)/Android.common.mk
 MESA_PYTHON2 := python
 
 classic_drivers := i915 i965
-gallium_drivers := swrast freedreno i915g ilo nouveau r300g r600g radeonsi 
vmwgfx vc4
+gallium_drivers := swrast freedreno i915g ilo nouveau r300g r600g radeonsi 
vmwgfx vc4 virgl
 
 MESA_GPU_DRIVERS := $(strip $(BOARD_GPU_DRIVERS))
 
diff --git a/src/gallium/Android.mk b/src/gallium/Android.mk
index b406d4a..749be7d 100644
--- a/src/gallium/Android.mk
+++ b/src/gallium/Android.mk
@@ -83,6 +83,11 @@ ifneq ($(filter vc4, $(MESA_GPU_DRIVERS)),)
 SUBDIRS += winsys/vc4/drm drivers/vc4
 endif
 
+# virgl
+ifneq ($(filter virgl, $(MESA_GPU_DRIVERS)),)
+SUBDIRS += winsys/virgl/drm drivers/virgl
+endif
+
 # vmwgfx
 ifneq ($(filter vmwgfx, $(MESA_GPU_DRIVERS)),)
 SUBDIRS += winsys/svga/drm drivers/svga
diff --git a/src/gallium/drivers/virgl/Android.mk 
b/src/gallium/drivers/virgl/Android.mk
new file mode 100644
index 000..b8309e4
--- /dev/null
+++ b/src/gallium/drivers/virgl/Android.mk
@@ -0,0 +1,35 @@
+# Copyright (C) 2014 Emil Velikov 
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+# DEALINGS IN THE SOFTWARE.
+
+LOCAL_PATH := $(call my-dir)
+
+# get C_SOURCES
+include $(LOCAL_PATH)/Makefile.sources
+
+include $(CLEAR_VARS)
+
+LOCAL_SRC_FILES := \
+   $(C_SOURCES)
+
+LOCAL_SHARED_LIBRARIES := libdrm
+LOCAL_MODULE := libmesa_pipe_virgl
+
+include $(GALLIUM_COMMON_MK)
+include $(BUILD_STATIC_LIBRARY)
diff --git a/src/gallium/targets/dri/Android.mk 
b/src/gallium/targets/dri/Android.mk
index 2d9610e..371e68d 100644
--- a/src/gallium/targets/dri/Android.mk
+++ b/src/gallium/targets/dri/Android.mk
@@ -92,6 +92,10 @@ ifneq ($(filter vc4,$(MESA_GPU_DRIVERS)),)
 LOCAL_CFLAGS += -DGALLIUM_VC4
 gallium_DRIVERS += libmesa_winsys_vc4 libmesa_pipe_vc4
 endif
+ifneq ($(filter virgl,$(MESA_GPU_DRIVERS)),)
+LOCAL_CFLAGS += -DGALLIUM_VIRGL
+gallium_DRIVERS += libmesa_winsys_virgl libmesa_pipe_virgl
+endif
 ifneq ($(filter vmwgfx,$(MESA_GPU_DRIVERS)),)
 gallium_DRIVERS += libmesa_winsys_svga libmesa_pipe_svga
 LOCAL_CFLAGS += -DGALLIUM_VMWGFX
diff --git a/src/gallium/winsys/virgl/drm/Android.mk 
b/src/gallium/winsys/virgl/drm/Android.mk
new file mode 100644
index 000..8493503
--- /dev/null
+++ b/src/gallium/winsys/virgl/drm/Android.mk
@@ -0,0 +1,34 @@
+# Copyright (C) 2014 Emil Velikov 
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#

Re: [Mesa-dev] [PATCH v2] Add .mailmap

2015-12-17 Thread Boris Peterbarg
Woah, this brought back memories of fighting with X and mesa!

If you want to list me, I'd prefer you use re...@users.sourceforge.net

On Thu, Dec 17, 2015 at 12:27 PM, Erik Faye-Lund 
wrote:

> On Thu, Dec 17, 2015 at 10:39 AM, Erik Faye-Lund 
> wrote:
> > On Thu, Dec 17, 2015 at 10:09 AM, Giuseppe Bilotta
> >  wrote:
> >> +# missing svn authors:
> >
> > I'm not sure this is useful, but just for kicks, I decided to try to
> > track down these contributors, and this is what I came up with
> > (suspected contributors CC'ed, so they can confirm or deny if they
> > want):
> >
> >> +pesco 
> >
> > Probably "Sven M. Hallberg "
> > (Sources: http://sourceforge.net/p/mesa3d/mailman/message/5416179/ and
> > https://github.com/jwiegley/lambdabot-1/blob/master/AUTHORS#L17)
>
> OK, this address bounced (or rather, mailer-dae...@kundenserver.de
> buonced pe...@gmx.de, so I guess it's forwarded to a dead address)
>
> However, his homepage (http://khjk.org/~pesco/) suggests that "Sven M.
> Hallberg " might be an up-to-date e-mail address.
> CC'ed.
>
> >> +reist 
> >
> > Probably "Boris Peterbarg "
> > (Sources:
> http://sourceforge.net/p/r300/mailman/r300-commit/?style=threaded=200502=16
> > and
> http://www.bugzilla.icculus.org/projects/beagle-avahi/Filters/entagged-sharp/Tracker/Util/TrackerTagReader.cs
> )
>
> This one also bounced, but it seems he committed to Rails
> (https://github.com/rails/rails/pull/19351) as "Boris Peterbarg
> ". Again, CC'ed.
>



-- 
Boris Peterbarg
Seeking Alpha
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] virtio_gpu: Add PCI ID to driver map

2015-12-17 Thread Rob Herring
Add the virtio-gpu PCI ID so the driver probing works.

Signed-off-by: Rob Herring 
---
 include/pci_ids/virtio_gpu_pci_ids.h | 1 +
 src/loader/pci_id_driver_map.h   | 7 +++
 2 files changed, 8 insertions(+)
 create mode 100644 include/pci_ids/virtio_gpu_pci_ids.h

diff --git a/include/pci_ids/virtio_gpu_pci_ids.h 
b/include/pci_ids/virtio_gpu_pci_ids.h
new file mode 100644
index 000..2e6ecaf
--- /dev/null
+++ b/include/pci_ids/virtio_gpu_pci_ids.h
@@ -0,0 +1 @@
+CHIPSET(0x0010, VIRTGL, VIRTGL)
diff --git a/src/loader/pci_id_driver_map.h b/src/loader/pci_id_driver_map.h
index 11e39d3..cab69fb 100644
--- a/src/loader/pci_id_driver_map.h
+++ b/src/loader/pci_id_driver_map.h
@@ -53,6 +53,12 @@ static const int radeonsi_chip_ids[] = {
 #undef CHIPSET
 };
 
+static const int virtio_gpu_chip_ids[] = {
+#define CHIPSET(chip, name, family) chip,
+#include "pci_ids/virtio_gpu_pci_ids.h"
+#undef CHIPSET
+};
+
 static const int vmwgfx_chip_ids[] = {
 #define CHIPSET(chip, name, family) chip,
 #include "pci_ids/vmwgfx_pci_ids.h"
@@ -78,6 +84,7 @@ static const struct {
{ 0x1002, "radeonsi", radeonsi_chip_ids, ARRAY_SIZE(radeonsi_chip_ids), 
_LOADER_GALLIUM},
{ 0x10de, "nouveau_vieux", NULL, -1,  _LOADER_DRI, is_nouveau_vieux },
{ 0x10de, "nouveau", NULL, -1,  _LOADER_GALLIUM },
+   { 0x1af4, "virtio_gpu", virtio_gpu_chip_ids, 
ARRAY_SIZE(virtio_gpu_chip_ids), _LOADER_GALLIUM },
{ 0x15ad, "vmwgfx", vmwgfx_chip_ids, ARRAY_SIZE(vmwgfx_chip_ids), 
_LOADER_GALLIUM },
{ 0x, NULL, NULL, 0 },
 };
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] freedreno/ir3: fix 32-bit builds with pointer-to-int-cast error enabled

2015-12-17 Thread Rob Herring
Android builds with -Werror=pointer-to-int-cast causing an error on 32-bit
builds.

Signed-off-by: Rob Herring 
---
 src/gallium/drivers/freedreno/ir3/ir3_print.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/ir3/ir3_print.c 
b/src/gallium/drivers/freedreno/ir3/ir3_print.c
index 07e03d2..a84e798 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_print.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_print.c
@@ -143,7 +143,7 @@ block_id(struct ir3_block *block)
 #ifdef DEBUG
return block->serialno;
 #else
-   return (uint32_t)(uint64_t)block;
+   return (uint32_t)(unsigned long)block;
 #endif
 }
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] Add .mailmap

2015-12-17 Thread Giuseppe Bilotta
On Thu, Dec 17, 2015 at 12:30 PM, Sven M. Hallberg  wrote:
> Erik Faye-Lund  writes:
>>> Probably "Sven M. Hallberg "
>>> (Sources: http://sourceforge.net/p/mesa3d/mailman/message/5416179/ and
>>> https://github.com/jwiegley/lambdabot-1/blob/master/AUTHORS#L17)
>>
>> OK, this address bounced (or rather, mailer-dae...@kundenserver.de
>> buonced pe...@gmx.de, so I guess it's forwarded to a dead address)
>>
>> However, his homepage (http://khjk.org/~pesco/) suggests that "Sven M.
>> Hallberg " might be an up-to-date e-mail address.
>> CC'ed.
>
> You are in fact correct. :)

Oh, nice, would it be OK for you if I add this email address to the
mesa .mailmap?


-- 
Giuseppe "Oblomov" Bilotta
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Only apply CS stall workaround pre-SKL

2015-12-17 Thread Ben Widawsky
As per the docs.

Cc: Kenneth Graunke 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index ae3d818..6c636d2 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -97,7 +97,8 @@ void
 brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags)
 {
if (brw->gen >= 8) {
-  gen8_add_cs_stall_workaround_bits();
+  if (brw->gen == 8)
+ gen8_add_cs_stall_workaround_bits();
 
   BEGIN_BATCH(6);
   OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2));
@@ -141,7 +142,8 @@ brw_emit_pipe_control_write(struct brw_context *brw, 
uint32_t flags,
 uint32_t imm_lower, uint32_t imm_upper)
 {
if (brw->gen >= 8) {
-  gen8_add_cs_stall_workaround_bits();
+  if (brw->gen == 8)
+ gen8_add_cs_stall_workaround_bits();
 
   BEGIN_BATCH(6);
   OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2));
-- 
2.6.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radeonsi: add RADEON_REPLACE_SHADERS debug option

2015-12-17 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This option allows replacing a single shader by a pre-compiled ELF object
as generated by LLVM's llc, for example. This can be useful for debugging a
deterministically occuring error in shaders (and has in fact helped find
the causes of https://bugs.freedesktop.org/show_bug.cgi?id=93264).
---
 src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
 src/gallium/drivers/radeonsi/si_debug.c   | 94 +++
 src/gallium/drivers/radeonsi/si_pipe.c|  3 +
 src/gallium/drivers/radeonsi/si_pipe.h|  1 +
 src/gallium/drivers/radeonsi/si_shader.c  | 18 +++--
 5 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index c3933b1d..556c7cc 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -87,6 +87,7 @@
 #define DBG_NO_DCC (1llu << 43)
 #define DBG_NO_DCC_CLEAR   (1llu << 44)
 #define DBG_NO_RB_PLUS (1llu << 45)
+#define DBG_REPLACE_SHADERS(1llu << 46)
 
 #define R600_MAP_BUFFER_ALIGNMENT 64
 
diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
b/src/gallium/drivers/radeonsi/si_debug.c
index c45f8c0..f50d98c 100644
--- a/src/gallium/drivers/radeonsi/si_debug.c
+++ b/src/gallium/drivers/radeonsi/si_debug.c
@@ -28,7 +28,9 @@
 #include "si_shader.h"
 #include "sid.h"
 #include "sid_tables.h"
+#include "radeon/radeon_elf_util.h"
 #include "ddebug/dd_util.h"
+#include "util/u_memory.h"
 
 
 static void si_dump_shader(struct si_shader_ctx_state *state, const char *name,
@@ -42,6 +44,98 @@ static void si_dump_shader(struct si_shader_ctx_state 
*state, const char *name,
fprintf(f, "%s\n\n", state->current->binary.disasm_string);
 }
 
+/**
+ * Shader compiles can be overridden with arbitrary ELF objects by setting
+ * the environment variable 
RADEON_REPLACE_SHADERS=num1:filename1[;num2:filename2]
+ */
+bool si_replace_shader(unsigned num, struct radeon_shader_binary *binary)
+{
+   const char *p = debug_get_option("RADEON_REPLACE_SHADERS", NULL);
+   const char *semicolon;
+   char *copy = NULL;
+   FILE *f;
+   long filesize, nread;
+   char *buf = NULL;
+   bool replaced = false;
+
+   if (!p)
+   return false;
+
+   while (*p) {
+   unsigned long i;
+   char *endp;
+   i = strtoul(p, , 0);
+
+   p = endp;
+   if (*p != ':') {
+   fprintf(stderr, "RADEON_REPLACE_SHADERS formatted 
badly.\n");
+   exit(1);
+   }
+   ++p;
+
+   if (i == num)
+   break;
+
+   p = strchr(p, ';');
+   if (!p)
+   return false;
+   ++p;
+   }
+   if (!*p)
+   return false;
+
+   semicolon = strchr(p, ';');
+   if (semicolon) {
+   p = copy = strndup(p, semicolon - p);
+   if (!copy) {
+   fprintf(stderr, "out of memory\n");
+   return false;
+   }
+   }
+
+   fprintf(stderr, "radeonsi: replace shader %u by %s\n", num, p);
+
+   f = fopen(p, "r");
+   if (!f) {
+   perror("radeonsi: failed to open file");
+   goto out_free;
+   }
+
+   if (fseek(f, 0, SEEK_END) != 0)
+   goto file_error;
+
+   filesize = ftell(f);
+   if (filesize < 0)
+   goto file_error;
+
+   if (fseek(f, 0, SEEK_SET) != 0)
+   goto file_error;
+
+   buf = MALLOC(filesize);
+   if (!buf) {
+   fprintf(stderr, "out of memory\n");
+   goto out_close;
+   }
+
+   nread = fread(buf, 1, filesize, f);
+   if (nread != filesize)
+   goto file_error;
+
+   radeon_elf_read(buf, filesize, binary);
+   replaced = true;
+
+out_close:
+   fclose(f);
+out_free:
+   FREE(buf);
+   free(copy);
+   return replaced;
+
+file_error:
+   perror("radeonsi: reading shader");
+   goto out_close;
+}
+
 /* Parsed IBs are difficult to read without colors. Use "less -R file" to
  * read them, or use "aha -b -f file" to convert them to html.
  */
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index ac13407..6a1911f 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -639,6 +639,9 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws)
if (debug_get_bool_option("RADEON_DUMP_SHADERS", FALSE))
sscreen->b.debug_flags |= DBG_FS | DBG_VS | DBG_GS | DBG_PS | 
DBG_CS;
 
+   if (debug_get_option("RADEON_REPLACE_SHADERS", NULL))
+   sscreen->b.debug_flags |= DBG_REPLACE_SHADERS;
+
/* Create the auxiliary context. This must be done 

[Mesa-dev] [PATCH 1/2] radeonsi: count compilations in si_compile_llvm

2015-12-17 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This changes the count slightly (because of si_generate_gs_copy_shader), but
this is only relevant for the driver-specific num-compilations query. It sets
the stage for the next commit.
---
 src/gallium/drivers/radeonsi/si_shader.c| 2 ++
 src/gallium/drivers/radeonsi/si_state_shaders.c | 1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 4a67276..511ed88 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -3885,6 +3885,8 @@ int si_compile_llvm(struct si_screen *sscreen, struct 
si_shader *shader,
shader->selector ? shader->selector->tokens : 
NULL);
bool dump_ir = dump_asm && !(sscreen->b.debug_flags & DBG_NO_IR);
 
+   p_atomic_inc(>b.num_compilations);
+
r = radeon_llvm_compile(mod, >binary,
r600_get_llvm_processor_name(sscreen->b.family), dump_ir, 
dump_asm, tm);
if (r)
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index f0147ce..8700590 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -634,7 +634,6 @@ static int si_shader_select(struct pipe_context *ctx,
sel->last_variant = shader;
}
state->current = shader;
-   p_atomic_inc(>screen->b.num_compilations);
pipe_mutex_unlock(sel->mutex);
return 0;
 }
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium/radeon: only dispose locally created target machine in radeon_llvm_compile

2015-12-17 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Unify the cleanup paths of the function rather than duplicating code.
---
 src/gallium/drivers/radeon/radeon_llvm_emit.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
b/src/gallium/drivers/radeon/radeon_llvm_emit.c
index 6b2ebde..61ed940 100644
--- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
+++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
@@ -188,8 +188,8 @@ unsigned radeon_llvm_compile(LLVMModuleRef M, struct 
radeon_shader_binary *binar
if (mem_err) {
fprintf(stderr, "%s: %s", __FUNCTION__, err);
FREE(err);
-   LLVMDisposeTargetMachine(tm);
-   return 1;
+   rval = 1;
+   goto out;
}
 
if (0 != rval) {
@@ -205,6 +205,7 @@ unsigned radeon_llvm_compile(LLVMModuleRef M, struct 
radeon_shader_binary *binar
/* Clean up */
LLVMDisposeMemoryBuffer(out_buffer);
 
+out:
if (dispose_tm) {
LLVMDisposeTargetMachine(tm);
}
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] Add .mailmap

2015-12-17 Thread Giuseppe Bilotta
Hello Boris,

On Thu, Dec 17, 2015 at 10:09 PM, Boris Peterbarg
 wrote:
> Woah, this brought back memories of fighting with X and mesa!
>
> If you want to list me, I'd prefer you use re...@users.sourceforge.net

Thanks, I've added the mapping. Unless of course you'd rather want to
remain anonymous 8-)

-- 
Giuseppe "Oblomov" Bilotta
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] Add .mailmap

2015-12-17 Thread Giuseppe Bilotta
On Thu, Dec 17, 2015 at 10:09 PM, Boris Peterbarg
 wrote:
> Woah, this brought back memories of fighting with X and mesa!
>
> If you want to list me, I'd prefer you use re...@users.sourceforge.net

Actually, your comment just gave me a _brilliant_ idea. If mesa was
hosted on SF's svn, I'm ready to bet that all the missing ones can be
recovered from the SF user list, so for example the still unidentified
sio and sio2 are probably

http://sourceforge.net/u/sio/profile/
http://sourceforge.net/u/sio2/profile/

Was Mesa ever hosted on a different svn repo?

-- 
Giuseppe "Oblomov" Bilotta
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/2] mesa: Add a _mesa_active_fragment_shader_has_side_effects helper

2015-12-17 Thread Iago Toral
On Thu, 2015-12-17 at 16:29 +0200, Francisco Jerez wrote:
> Iago Toral Quiroga  writes:
> 
> > Some drivers can disable the FS unit if there is nothing in the shader code
> > that writes to an output (i.e. color, depth, etc). Right now, mesa has
> > a function to check for atomic buffers and the i965 driver also checks for
> > images. Refactor this logic into a generic function that we can use for
> > any source of side effects in a fragment shader. Sugested by Jason.
> > ---
> >  src/mesa/drivers/dri/i965/gen7_wm_state.c |  6 +-
> >  src/mesa/drivers/dri/i965/gen8_ps_state.c |  3 +--
> >  src/mesa/main/mtypes.h| 15 ---
> >  3 files changed, 14 insertions(+), 10 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c 
> > b/src/mesa/drivers/dri/i965/gen7_wm_state.c
> > index 06d5e65..a6d1028 100644
> > --- a/src/mesa/drivers/dri/i965/gen7_wm_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c
> > @@ -77,13 +77,9 @@ upload_wm_state(struct brw_context *brw)
> >dw1 |= GEN7_WM_KILL_ENABLE;
> > }
> >  
> > -   if (_mesa_active_fragment_shader_has_atomic_ops(>ctx)) {
> > -  dw1 |= GEN7_WM_DISPATCH_ENABLE;
> > -   }
> > -
> > /* _NEW_BUFFERS | _NEW_COLOR */
> > if (brw_color_buffer_write_enabled(brw) || writes_depth ||
> > -   prog_data->base.nr_image_params ||
> > +   _mesa_active_fragment_shader_has_side_effects(>ctx) ||
> > dw1 & GEN7_WM_KILL_ENABLE) {
> >dw1 |= GEN7_WM_DISPATCH_ENABLE;
> > }
> 
> Hey, it looks like SSBOs are still missing a couple of things that could
> make their side effects rather non-deterministic on i965 hardware: On
> HSW you should probably set the UAV_ONLY WM state bit when there are no
> colour or depth buffer writes as is done for images below in this same
> function, and on all hardware you should set the early depth/stencil
> control field to PSEXEC unless early fragment tests are enabled to make
> sure that the fragment shader is executed regardless of whether
> per-fragment tests pass or not as the spec requires.

Sure, I'll add this and send a new version. Thanks Curro!

BTW, I see that we are doing these two things only for images at the
moment, I guess we should we do it for atomic buffers as well, right?

> > diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c 
> > b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> > index 945f710..3cc8c68 100644
> > --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> > @@ -90,8 +90,7 @@ gen8_upload_ps_extra(struct brw_context *brw,
> >  *
> >  * BRW_NEW_FS_PROG_DATA | BRW_NEW_FRAGMENT_PROGRAM | _NEW_BUFFERS | 
> > _NEW_COLOR
> >  */
> > -   if ((_mesa_active_fragment_shader_has_atomic_ops(>ctx) ||
> > -prog_data->base.nr_image_params) &&
> > +   if (_mesa_active_fragment_shader_has_side_effects(>ctx) &&
> > !brw_color_buffer_write_enabled(brw))
> >dw1 |= GEN8_PSX_SHADER_HAS_UAV;
> >  
> > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> > index 191a9ea..834ba59 100644
> > --- a/src/mesa/main/mtypes.h
> > +++ b/src/mesa/main/mtypes.h
> > @@ -4538,11 +4538,20 @@ enum _debug
> > DEBUG_INCOMPLETE_FBO = (1 << 3)
> >  };
> >  
> > +/**
> > + * Checks if the active fragment shader program can have side effects due
> > + * to use of things like atomic buffers or images
> > + */
> >  static inline bool
> > -_mesa_active_fragment_shader_has_atomic_ops(const struct gl_context *ctx)
> > +_mesa_active_fragment_shader_has_side_effects(const struct gl_context *ctx)
> >  {
> > -   return ctx->Shader._CurrentFragmentProgram != NULL &&
> > -  
> > ctx->Shader._CurrentFragmentProgram->_LinkedShaders[MESA_SHADER_FRAGMENT]->NumAtomicBuffers
> >  > 0;
> > +   const struct gl_shader *sh;
> > +
> > +   if (!ctx->Shader._CurrentFragmentProgram)
> > +  return false;
> > +
> > +   sh = 
> > ctx->Shader._CurrentFragmentProgram->_LinkedShaders[MESA_SHADER_FRAGMENT];
> > +   return sh->NumAtomicBuffers > 0 || sh->NumImages > 0;
> >  }
> >  
> >  #ifdef __cplusplus
> > -- 
> > 1.9.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] Add .mailmap

2015-12-17 Thread Sven M. Hallberg
Erik Faye-Lund  writes:
>> Probably "Sven M. Hallberg "
>> (Sources: http://sourceforge.net/p/mesa3d/mailman/message/5416179/ and
>> https://github.com/jwiegley/lambdabot-1/blob/master/AUTHORS#L17)
>
> OK, this address bounced (or rather, mailer-dae...@kundenserver.de
> buonced pe...@gmx.de, so I guess it's forwarded to a dead address)
>
> However, his homepage (http://khjk.org/~pesco/) suggests that "Sven M.
> Hallberg " might be an up-to-date e-mail address.
> CC'ed.

You are in fact correct. :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/11] [RFC] mesa: allow binding framebuffer without depth

2015-12-17 Thread Marek Olšák
On Wed, Dec 16, 2015 at 11:30 PM, Miklós Máté  wrote:
> On 12/16/2015 05:27 PM, Marek Olšák wrote:
>>
>> What is this good for?
>>
>> Marek
>
> KotOR uses a series of scratch framebuffers for drawing the framebuffer
> effects. These have no depth and no stencil, so check_compatible() rejects
> them, subsequent GL calls are no-op, and the screen becomes garbage. I also
> experimented successfully with disabling the visuals that have no depth or
> no stencil in gallium/state_trackers/dri/dri_screen.c, but I concluded that
> this one was a smaller hack.

What kind of scratch buffer are we talking about? How is it created?

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/radeon: only dispose locally created target machine in radeon_llvm_compile

2015-12-17 Thread Michel Dänzer
On 18.12.2015 07:00, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> Unify the cleanup paths of the function rather than duplicating code.

This should probably be backported to the stable branches? If so, add

Cc: "11.0 11.1" 

to the commit log, no need to actually send the patch to the mesa-stable
mailing list.


Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] gallium: Remove unnecessary semicolons

2015-12-17 Thread Marek Olšák
On Thu, Dec 17, 2015 at 5:13 AM, Matt Turner  wrote:
> On Wed, Dec 16, 2015 at 7:41 PM, Edward O'Callaghan
>  wrote:
>> Fix silly issue with MSVC case fall-though support to need
>> a extra 'break;'
>>
>> Found-by: Coccinelle
>> Signed-off-by: Edward O'Callaghan 
>> Reviewed-by: Brian Paul 
>> ---
>>  src/gallium/auxiliary/draw/draw_pipe_aaline.c  | 2 +-
>>  src/gallium/auxiliary/gallivm/lp_bld_swizzle.c | 2 +-
>>  src/gallium/auxiliary/nir/tgsi_to_nir.c| 2 +-
>>  src/gallium/auxiliary/util/u_surface.c | 3 ++-
>>  src/gallium/auxiliary/vl/vl_mpeg12_decoder.c   | 2 +-
>>  src/gallium/state_trackers/nine/swapchain9.c   | 2 +-
>>  src/gallium/state_trackers/omx/entrypoint.c| 2 +-
>>  src/gallium/state_trackers/vdpau/mixer.c   | 2 +-
>>  8 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/draw/draw_pipe_aaline.c 
>> b/src/gallium/auxiliary/draw/draw_pipe_aaline.c
>> index 877db59..4a676b7 100644
>> --- a/src/gallium/auxiliary/draw/draw_pipe_aaline.c
>> +++ b/src/gallium/auxiliary/draw/draw_pipe_aaline.c
>> @@ -937,7 +937,7 @@ draw_aaline_prepare_outputs(struct draw_context *draw,
>> const struct pipe_rasterizer_state *rast = draw->rasterizer;
>>
>> /* update vertex attrib info */
>> -   aaline->pos_slot = draw_current_shader_position_output(draw);;
>> +   aaline->pos_slot = draw_current_shader_position_output(draw);
>>
>> if (!rast->line_smooth)
>>return;
>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_swizzle.c 
>> b/src/gallium/auxiliary/gallivm/lp_bld_swizzle.c
>> index b1aef71..f571838 100644
>> --- a/src/gallium/auxiliary/gallivm/lp_bld_swizzle.c
>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_swizzle.c
>> @@ -720,7 +720,7 @@ lp_build_transpose_aos_n(struct gallivm_state *gallivm,
>>
>>default:
>>   assert(0);
>> -   };
>> +   }
>>  }
>>
>>
>> diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c 
>> b/src/gallium/auxiliary/nir/tgsi_to_nir.c
>> index 5def6d3..5cbe8e9 100644
>> --- a/src/gallium/auxiliary/nir/tgsi_to_nir.c
>> +++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c
>> @@ -1951,7 +1951,7 @@ tgsi_processor_to_shader_stage(unsigned processor)
>> case TGSI_PROCESSOR_COMPUTE:   return MESA_SHADER_COMPUTE;
>> default:
>>unreachable("invalid TGSI processor");
>> -   };
>> +   }
>>  }
>>
>>  struct nir_shader *
>> diff --git a/src/gallium/auxiliary/util/u_surface.c 
>> b/src/gallium/auxiliary/util/u_surface.c
>> index 6aa44f9..c150d92 100644
>> --- a/src/gallium/auxiliary/util/u_surface.c
>> +++ b/src/gallium/auxiliary/util/u_surface.c
>> @@ -600,7 +600,8 @@ is_box_inside_resource(const struct pipe_resource *res,
>>depth = res->array_size;
>>assert(res->array_size % 6 == 0);
>>break;
>> -   case PIPE_MAX_TEXTURE_TYPES:;
>
> Yuck! I have never seen this before.
>
> Grepping for ':;' turns up a bunch more of these. They all seem to
> blame to Marek. Can we please not do this?

AFAIK, default:; is the simplest way to add an empty default statement
to hide warnings showing unhandled cases. break; isn't very useful if
it's the last statement.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util/macros: Simplify DIV_ROUND_UP() definition

2015-12-17 Thread Glenn Kennard

On Wed, 16 Dec 2015 20:57:51 +0100, Nanley Chery  wrote:


From: Nanley Chery 

Commit 64880d073ab21ae1abad0c049ea2d6a1169a3cfa consolidated two
DIV_ROUND_UP() definitions to one, but chose the more
compute-intensive version in the process. Use the simpler version
instead. Reduces .text size by 1360 bytes.

Output of `size lib/i965_dri.so`:
  textdata bss dec hex filename
   7850440  219264   27240 8096944  7b8cb0 lib/i965_dri.so (before)
   7849080  219264   27240 8095584  7b8760 lib/i965_dri.so (after)

Cc: Axel Davy 
Signed-off-by: Nanley Chery 
---
 src/util/macros.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/macros.h b/src/util/macros.h
index 0c8958f..53a98a0 100644
--- a/src/util/macros.h
+++ b/src/util/macros.h
@@ -211,6 +211,6 @@ do {   \
 #endif
/** Compute ceiling of integer quotient of A divided by B. */
-#define DIV_ROUND_UP( A, B )  ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 )
+#define DIV_ROUND_UP(A, B)  (((A) + (B) - 1) / (B))
#endif /* UTIL_MACROS_H */


I'll point out that these are not equivalent, one can overflow and the other 
doesn't. You
probably want to check if the call sites have sufficient checks for that before
substituting one for the other.


/Glenn
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] Add .mailmap

2015-12-17 Thread Giuseppe Bilotta
On Thu, Dec 17, 2015 at 4:46 PM, Matt Turner  wrote:
> On Thu, Dec 17, 2015 at 1:09 AM, Giuseppe Bilotta
>  wrote:
>> This adds a first tentative .mailmap file, to canonicize contributor
>> name/emails in shortlogs and other statistical endeavours.
>
> If we want this kind of information, should we just go with Ken's
> gitdm branch? That'll provide information about per-company
> contribution stats as well.

git builtins can't use the gitdm stuff, AFAIK (so e.g. git shortlog
wouldn't coalesce authors). Having duplicate information _is_
annoying, but if git dm can use the .mailmap (at least partially),
then of course redundancy can be curbed.

-- 
Giuseppe "Oblomov" Bilotta
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util/macros: Simplify DIV_ROUND_UP() definition

2015-12-17 Thread Nanley Chery
On Thu, Dec 17, 2015 at 11:25 AM, Matt Turner  wrote:

> On Thu, Dec 17, 2015 at 11:04 AM, Nanley Chery 
> wrote:
> > On Thu, Dec 17, 2015 at 12:05:46PM +0100, Glenn Kennard wrote:
> >> On Wed, 16 Dec 2015 20:57:51 +0100, Nanley Chery 
> wrote:
> >>
> >> >From: Nanley Chery 
> >> >
> >> >Commit 64880d073ab21ae1abad0c049ea2d6a1169a3cfa consolidated two
> >> >DIV_ROUND_UP() definitions to one, but chose the more
> >> >compute-intensive version in the process. Use the simpler version
> >> >instead. Reduces .text size by 1360 bytes.
> >> >
> >> >Output of `size lib/i965_dri.so`:
> >> >  textdata bss dec hex filename
> >> >   7850440  219264   27240 8096944  7b8cb0 lib/i965_dri.so (before)
> >> >   7849080  219264   27240 8095584  7b8760 lib/i965_dri.so (after)
> >> >
> >> >Cc: Axel Davy 
> >> >Signed-off-by: Nanley Chery 
> >> >---
> >> > src/util/macros.h | 2 +-
> >> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> >diff --git a/src/util/macros.h b/src/util/macros.h
> >> >index 0c8958f..53a98a0 100644
> >> >--- a/src/util/macros.h
> >> >+++ b/src/util/macros.h
> >> >@@ -211,6 +211,6 @@ do {   \
> >> > #endif
> >> >/** Compute ceiling of integer quotient of A divided by B. */
> >> >-#define DIV_ROUND_UP( A, B )  ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 )
> >> >+#define DIV_ROUND_UP(A, B)  (((A) + (B) - 1) / (B))
> >> >#endif /* UTIL_MACROS_H */
> >>
> >> I'll point out that these are not equivalent, one can overflow and the
> other doesn't. You
> >> probably want to check if the call sites have sufficient checks for
> that before
> >> substituting one for the other.
> >>
> >
> > Good point. As I mentioned in another email, I'll leave the current
> > macro untouched.
>
>
The other email I referenced unfortunately didn't make it to the list due
email client
issues.


> I think the chances of us relying on overflow behavior are exceedingly
> small, and nearly all uses of DIV_ROUND_UP are in the i965 driver. I
> think it's sufficiently safe to go ahead with the patch (but I am
> still interested to know about your compiler and CFLAGS).
>

My error was that I forgot to remove --enable-debug and add -02. After
making those
changes, my binary size also increases but by ~1.1k. Another implementation
which
avoids overflow increases the binary size by 2 bytes.

Correct me if I'm wrong, but aligning w/ the kernel implementation (or
changing the
current definition for that matter) seems like a regression more than an
improvement.

Thanks,
Nanley
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/7] nir: Teach nir_opt_algebraic about adding and subtracting the same thing

2015-12-17 Thread Jason Ekstrand
On Dec 17, 2015 1:21 AM, "Eero Tamminen"  wrote:
>
> Hi,
>
>
> On 12/17/2015 01:52 AM, Matt Turner wrote:
>>
>> On Tue, Dec 15, 2015 at 1:16 AM, Eduardo Lima Mitev 
wrote:
>>>
>>> On 12/15/2015 09:28 AM, Kristian Høgsberg Kristensen wrote:

 This optimizes a + b - b to just a. Modest shader-db results (BDW):

total instructions in shared programs: 7842452 -> 7841862 (-0.01%)
instructions in affected programs: 61938 -> 61348 (-0.95%)
total loops in shared programs:2131 -> 2131 (0.00%)
helped:263
HURT:  0
GAINED:0
LOST:  0
>>>
>>>
>>> In HSW, I get these shader-db results:
>>>
>>> total instructions in shared programs: 6257265 -> 6256788 (-0.01%)
>>> instructions in affected programs: 46601 -> 46124 (-1.02%)
>>> helped: 218
>>> HURT: 0
>>>
>>> total cycles in shared programs: 56010026 -> 56007760 (-0.00%)
>>> cycles in affected programs: 1048392 -> 1046126 (-0.22%)
>>> helped: 199
>>> HURT: 154
>>>
>>> total loops in shared programs: 1979 -> 1979 (0.00%)
>>> loops in affected programs: 0 -> 0
>>> helped: 0
>>> HURT: 0
>>>
>>> LOST:   0
>>> GAINED: 0
>>>
>>> I wonder where those cycle HURTs come from. In any case, the net result
>>> is positive.
>>
>>
>> I haven't confirmed, but I've seen cases that seem like the cycle
>> counts are wrong.
>
>
> I have doubts about the correctness of latency values set in
brw_schedule_instructions.cpp.
>
> They were added mostly by Eric on 2012 & 2013.  You added mad & lrp data
in 2013 and Curro untyped atomics & surface reads in 2013.  Both of them
have is_haswell check, but don't say anything about newer generations.
>
> It seems that some of the values are from spec and some from tests.
However, for the test data, the code doesn't say on what exact HW and
stepping the tests were run on.  Or where the sources for those tests are
so that one could try to reproduce the results, verify (with perf counters)
that they actually are bound by what the test says, and update data gotten
from them for newer generations (i.e. GEN8+).

It would be pretty fantastic to get updated or more reliable data.
However, I think what we have is probably good enough.  The important part
is that 3src are slightly more expensive and sends are hugely expensive and
I think the current numbers capture that well enough to be moderately
useful.  As far add register banks go, we've had trouble getting real data
or even documentation on that.

> In addition to this, Mesa is lacking at least stall cycles for 3src
register bank conflicts.
>
>
> - Eero
>
> PS. cycle values are anyway going to be off, code doesn't know memory
latencies as that depends on locality & cache utilization, and it doesn't
take threading into account.  But it only tries to schedule things so that
HW is able to better compensate latency, so it doesn't need to know how
much cycles take, just have good enough estimate. :-)

For that matter, it can depend on how many vertices you have in the pipe or
how big your polygon is.  It'll never be perfect; there's only so much you
can know from only the shader code.  :-).  However, a program with a
smaller theoretical cycle count will probably execute faster than one with
a higher cycle count, so it is useful.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util/macros: Simplify DIV_ROUND_UP() definition

2015-12-17 Thread Nanley Chery
On Thu, Dec 17, 2015 at 11:53:03AM -0800, Nanley Chery wrote:
> On Thu, Dec 17, 2015 at 11:25 AM, Matt Turner  wrote:
> 
> > On Thu, Dec 17, 2015 at 11:04 AM, Nanley Chery 
> > wrote:
> > > On Thu, Dec 17, 2015 at 12:05:46PM +0100, Glenn Kennard wrote:
> > >> On Wed, 16 Dec 2015 20:57:51 +0100, Nanley Chery 
> > wrote:
> > >>
> > >> >From: Nanley Chery 
> > >> >
> > >> >Commit 64880d073ab21ae1abad0c049ea2d6a1169a3cfa consolidated two
> > >> >DIV_ROUND_UP() definitions to one, but chose the more
> > >> >compute-intensive version in the process. Use the simpler version
> > >> >instead. Reduces .text size by 1360 bytes.
> > >> >
> > >> >Output of `size lib/i965_dri.so`:
> > >> >  textdata bss dec hex filename
> > >> >   7850440  219264   27240 8096944  7b8cb0 lib/i965_dri.so (before)
> > >> >   7849080  219264   27240 8095584  7b8760 lib/i965_dri.so (after)
> > >> >
> > >> >Cc: Axel Davy 
> > >> >Signed-off-by: Nanley Chery 
> > >> >---
> > >> > src/util/macros.h | 2 +-
> > >> > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >> >
> > >> >diff --git a/src/util/macros.h b/src/util/macros.h
> > >> >index 0c8958f..53a98a0 100644
> > >> >--- a/src/util/macros.h
> > >> >+++ b/src/util/macros.h
> > >> >@@ -211,6 +211,6 @@ do {   \
> > >> > #endif
> > >> >/** Compute ceiling of integer quotient of A divided by B. */
> > >> >-#define DIV_ROUND_UP( A, B )  ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 )
> > >> >+#define DIV_ROUND_UP(A, B)  (((A) + (B) - 1) / (B))
> > >> >#endif /* UTIL_MACROS_H */
> > >>
> > >> I'll point out that these are not equivalent, one can overflow and the
> > other doesn't. You
> > >> probably want to check if the call sites have sufficient checks for
> > that before
> > >> substituting one for the other.
> > >>
> > >
> > > Good point. As I mentioned in another email, I'll leave the current
> > > macro untouched.
> >
> >
> The other email I referenced unfortunately didn't make it to the list due
> email client
> issues.
> 
> 
> > I think the chances of us relying on overflow behavior are exceedingly
> > small, and nearly all uses of DIV_ROUND_UP are in the i965 driver. I
> > think it's sufficiently safe to go ahead with the patch (but I am
> > still interested to know about your compiler and CFLAGS).
> >
> 
> My error was that I forgot to remove --enable-debug and add -02. After
> making those
> changes, my binary size also increases but by ~1.1k. Another implementation
> which
> avoids overflow increases the binary size by 2 bytes.
> 
> Correct me if I'm wrong, but aligning w/ the kernel implementation (or
> changing the
> current definition for that matter) seems like a regression more than an
> improvement.

Sorry for the previously oddly-wrapped message. I looked into this with 
Kristian, and the kernel's version does seem to produce better assembly.
It's not yet clear why the .text size increases. My current hypothesis
is that the number of no-ops increase, but that's yet to be proven.

Thanks,
Nanley
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Only apply CS stall workaround pre-SKL

2015-12-17 Thread Kenneth Graunke
On Thursday, December 17, 2015 01:21:26 PM Ben Widawsky wrote:
> As per the docs.
> 
> Cc: Kenneth Graunke 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
> b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> index ae3d818..6c636d2 100644
> --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> @@ -97,7 +97,8 @@ void
>  brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags)
>  {
> if (brw->gen >= 8) {
> -  gen8_add_cs_stall_workaround_bits();
> +  if (brw->gen == 8)
> + gen8_add_cs_stall_workaround_bits();
>  
>BEGIN_BATCH(6);
>OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2));
> @@ -141,7 +142,8 @@ brw_emit_pipe_control_write(struct brw_context *brw, 
> uint32_t flags,
>  uint32_t imm_lower, uint32_t imm_upper)
>  {
> if (brw->gen >= 8) {
> -  gen8_add_cs_stall_workaround_bits();
> +  if (brw->gen == 8)
> + gen8_add_cs_stall_workaround_bits();
>  
>BEGIN_BATCH(6);
>OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2));
> 

"No restrictions."  Hard to believe, but that's what it says :)

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev