Re: [Mesa-dev] [OT] some contribution statistics
On Thu, Dec 17, 2015 at 4:57 AM, Kenneth Graunkewrote: > > Absolutely. Feel free to send them to me personally. Or, I could > include the files upstream if that's of interest to people, and then > people could just update stuff themselves. > > I just threw together a best-effort set of configuration files so I > could run statistics on Mesa and Piglit over the last few years. As a lover of statistics, I think it would be worth to merge it upstream. Does git dm make use of the .mailmap if present? If so, some of the overlapping information could be removed (since .mailmap is used by other git tools too, but the dm stuff is not). -- Giuseppe "Oblomov" Bilotta ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] Add .mailmap
This adds a first tentative .mailmap file, to canonicize contributor name/emails in shortlogs and other statistical endeavours. Signed-off-by: Giuseppe Bilotta--- .mailmap | 462 +++ 1 file changed, 462 insertions(+) create mode 100644 .mailmap diff --git a/.mailmap b/.mailmap new file mode 100644 index 000..827bfde --- /dev/null +++ b/.mailmap @@ -0,0 +1,462 @@ +Aapo Tahkola + +Adam Jackson +Adam Jackson + +Adrian Marius Negreanu Adrian Negreanu +Adrian Marius Negreanu Negreanu Marius Adrian + +Dave Airlie +Dave Airlie airlied +Dave Airlie +Dave Airlie +Dave Airlie +Dave Airlie +Dave Airlie +Dave Airlie +Dave Airlie + +Alan Coopersmith + +Alan Hourihane +Alan Hourihane +Alan Hourihane + +Alexander Monakov + +Alexander von Gluck IV Alexander von Gluck + +Alex Corscadden +Alex Corscadden + +Alex Deucher +Alex Deucher +Alex Deucher +Alex Deucher +Alex Deucher +Alex Deucher + +Andreas Fänger + +Andreas Hartmetz + +Andre Heider +Andreas Heider + +Andreas Pokorny + +Andrew Randrianasulu +Andrew Randrianasulu + +Arthur Huillet Arthur HUILLET + +Benjamin Franzke ben + +Ben Skeggs +Ben Skeggs +Ben Skeggs +Ben Skeggs +Ben Skeggs +Ben Skeggs +Ben Skeggs + +Ben Widawsky Ben Widawsky + +Blair Sadewitz Blair Sadewitz + +Brian Paul Brian +Brian Paul +Brian Paul +Brian Paul +Brian Paul brian +Brian Paul Brian +Brian Paul Brian +Brian Paul Brian +Brian Paul Brian +Brian Paul Brian +Brian Paul Brian +Brian Paul root +# The next ones are not 100% sure +Brian Paul root +Brian Paul root +Brian Paul root + +Bruce Merry + +Carl-Philip Hänsch Carl-Philip Haensch +Carl-Philip Hänsch Carl-Philip Haensch +Carl-Philip Hänsch Carl-Philip Haensch + +Chad Versace +Chad Versace +Chad Versace
[Mesa-dev] [Bug 80183] [llvmpipe] triangles with vertices that map to raster positions > viewport width/height are not displayed
https://bugs.freedesktop.org/show_bug.cgi?id=80183 --- Comment #18 from cgerlac...@gmail.com --- You were right, the latest mesa version fixes this problem :-) Latest git hash from my checkout is: e97b207654a1c0b2c27b6ad6579b5570a83ce8cd I also tried to replay an apitrace generated with hardware rendering, but I can't get the glretrace to use software rendering. It always uses the systems opengl32.dll and therefore the clipping problem is not reproducible. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] i965/gen8/cs: Gen8 requires 64 byte alignment for push constant data
On Wed, 2015-12-16 at 11:39 -0800, Kenneth Graunke wrote: > On Wednesday, December 16, 2015 10:02:16 AM Iago Toral Quiroga wrote: > > The BDW PRM Vol2a: Command Reference: Instructions, section > > MEDIA_CURBE_LOAD, > > says that 'CURBE Total Data Length' and 'CURBE Data Start Address' are > > 64-byte aligned. This is different from previous gens, that were 32-byte > > aligned. > > > > v2 (Jordan): > > - CURBE Data Start Address is also 64-byte aligned. > > - The call to brw_state_batch should also use 64-byte alignment. > > - Improve PRM reference. > > > > Fixes the following SSBO CTS tests on BDW: > > ES31-CTS.shader_storage_buffer_object.basic-atomic-case1-cs > > ES31-CTS.shader_storage_buffer_object.basic-operations-case1-cs > > ES31-CTS.shader_storage_buffer_object.basic-operations-case2-cs > > ES31-CTS.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case2-cs > > ES31-CTS.shader_storage_buffer_object.advanced-write-fragment-cs > > ES31-CTS.shader_storage_buffer_object.advanced-indirectAddressing-case2-cs > > ES31-CTS.shader_storage_buffer_object.advanced-matrix-cs > > > > And many other CS CTS tests as reported by Marta Lofstedt. > > --- > > src/mesa/drivers/dri/i965/gen7_cs_state.c | 12 > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c > > b/src/mesa/drivers/dri/i965/gen7_cs_state.c > > index 1fde69c..df0f301 100644 > > --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c > > +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c > > @@ -68,7 +68,7 @@ brw_upload_cs_state(struct brw_context *brw) > > > > uint32_t *bind = (uint32_t*) brw_state_batch(brw, > > AUB_TRACE_BINDING_TABLE, > > > > prog_data->binding_table.size_bytes, > > -32, > > _state->bind_bo_offset); > > +64, > > _state->bind_bo_offset); > > I don't understand this hunk - binding tables don't have anything to do > with push constants. These are for pull constants and UBOs. At least > in the 3D pipeline, we only align these to 32B, not 64. > > > unsigned local_id_dwords = 0; > > > > @@ -77,7 +77,8 @@ brw_upload_cs_state(struct brw_context *brw) > > > > unsigned push_constant_data_size = > >(prog_data->nr_params + local_id_dwords) * sizeof(gl_constant_value); > > - unsigned reg_aligned_constant_size = ALIGN(push_constant_data_size, 32); > > + unsigned reg_aligned_constant_size = > > + ALIGN(push_constant_data_size, brw->gen < 8 ? 32 : 64); > > unsigned push_constant_regs = reg_aligned_constant_size / 32; > > unsigned threads = get_cs_thread_count(cs_prog_data); > > > > @@ -138,11 +139,13 @@ brw_upload_cs_state(struct brw_context *brw) > > ADVANCE_BATCH(); > > > > if (reg_aligned_constant_size > 0) { > > + const unsigned aligned_push_const_offset = > > + ALIGN(stage_state->push_const_offset, brw->gen < 8 ? 32 : 64); > > This is wrong. What you want is to change: > > param = (gl_constant_value*) > brw_state_batch(brw, type, > reg_aligned_constant_size * threads, > 32, _state->push_const_offset); > > to use an alignment of 64 instead of 32 on Gen8+. That way, it'll > actually upload the data to a portion of the buffer that starts on > a 64B aligned boundary. > > As is, you're uploading the data to a 32B aligned section and then > just fudging the pointer to be 64B aligned, possibly skipping over > the first 32B. Probably not what you wanted :) > > Maybe you accidentally changed the wrong brw_state_batch call? Ouch! yeah, I meant to change the other call, sorry about that. Anyway, I think the patch proposed by Jordan is good so I won't send another version. Thanks for the review Ken! Iago > >BEGIN_BATCH(4); > >OUT_BATCH(MEDIA_CURBE_LOAD << 16 | (4 - 2)); > >OUT_BATCH(0); > >OUT_BATCH(reg_aligned_constant_size * threads); > > - OUT_BATCH(stage_state->push_const_offset); > > + OUT_BATCH(aligned_push_const_offset); > >ADVANCE_BATCH(); > > } > > > > @@ -241,7 +244,8 @@ brw_upload_cs_push_constants(struct brw_context *brw, > > > >const unsigned push_constant_data_size = > > (local_id_dwords + prog_data->nr_params) * > > sizeof(gl_constant_value); > > - const unsigned reg_aligned_constant_size = > > ALIGN(push_constant_data_size, 32); > > + const unsigned reg_aligned_constant_size = > > + ALIGN(push_constant_data_size, brw->gen < 8 ? 32 : 64); > >const unsigned param_aligned_count = > > reg_aligned_constant_size / sizeof(*param); > > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] i965/gen8/cs: Gen8 requires 64 byte alignment for push constant data
On Wed, 2015-12-16 at 14:48 -0800, Jordan Justen wrote: > On 2015-12-16 11:39:00, Kenneth Graunke wrote: > > On Wednesday, December 16, 2015 10:02:16 AM Iago Toral Quiroga wrote: > > > The BDW PRM Vol2a: Command Reference: Instructions, section > > > MEDIA_CURBE_LOAD, > > > says that 'CURBE Total Data Length' and 'CURBE Data Start Address' are > > > 64-byte aligned. This is different from previous gens, that were 32-byte > > > aligned. > > > > > > v2 (Jordan): > > > - CURBE Data Start Address is also 64-byte aligned. > > > - The call to brw_state_batch should also use 64-byte alignment. > > > - Improve PRM reference. > > > > > > Fixes the following SSBO CTS tests on BDW: > > > ES31-CTS.shader_storage_buffer_object.basic-atomic-case1-cs > > > ES31-CTS.shader_storage_buffer_object.basic-operations-case1-cs > > > ES31-CTS.shader_storage_buffer_object.basic-operations-case2-cs > > > ES31-CTS.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case2-cs > > > ES31-CTS.shader_storage_buffer_object.advanced-write-fragment-cs > > > ES31-CTS.shader_storage_buffer_object.advanced-indirectAddressing-case2-cs > > > ES31-CTS.shader_storage_buffer_object.advanced-matrix-cs > > > > > > And many other CS CTS tests as reported by Marta Lofstedt. > > > --- > > > src/mesa/drivers/dri/i965/gen7_cs_state.c | 12 > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c > > > b/src/mesa/drivers/dri/i965/gen7_cs_state.c > > > index 1fde69c..df0f301 100644 > > > --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c > > > +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c > > > @@ -68,7 +68,7 @@ brw_upload_cs_state(struct brw_context *brw) > > > > > > uint32_t *bind = (uint32_t*) brw_state_batch(brw, > > > AUB_TRACE_BINDING_TABLE, > > > > > > prog_data->binding_table.size_bytes, > > > -32, > > > _state->bind_bo_offset); > > > +64, > > > _state->bind_bo_offset); > > > > I don't understand this hunk - binding tables don't have anything to do > > with push constants. These are for pull constants and UBOs. At least > > in the 3D pipeline, we only align these to 32B, not 64. > > Yeah. I think he wants to update the call you pointed out below in > brw_upload_cs_push_constants. > > Also, how about consistently applying the alignment change? Either, > just bump the base and size alignment to 64, or also check the gen to > align the base to 32 on gen7. > > How about the attached patch? Yeah, that looks simpler. The patch is: Tested-by: Iago Toral QuirogaReviewed-by: Iago Toral Quiroga Thanks Jordan! > -Jordan > > > > unsigned local_id_dwords = 0; > > > > > > @@ -77,7 +77,8 @@ brw_upload_cs_state(struct brw_context *brw) > > > > > > unsigned push_constant_data_size = > > >(prog_data->nr_params + local_id_dwords) * > > > sizeof(gl_constant_value); > > > - unsigned reg_aligned_constant_size = ALIGN(push_constant_data_size, > > > 32); > > > + unsigned reg_aligned_constant_size = > > > + ALIGN(push_constant_data_size, brw->gen < 8 ? 32 : 64); > > > unsigned push_constant_regs = reg_aligned_constant_size / 32; > > > unsigned threads = get_cs_thread_count(cs_prog_data); > > > > > > @@ -138,11 +139,13 @@ brw_upload_cs_state(struct brw_context *brw) > > > ADVANCE_BATCH(); > > > > > > if (reg_aligned_constant_size > 0) { > > > + const unsigned aligned_push_const_offset = > > > + ALIGN(stage_state->push_const_offset, brw->gen < 8 ? 32 : 64); > > > > This is wrong. What you want is to change: > > > > param = (gl_constant_value*) > > brw_state_batch(brw, type, > > reg_aligned_constant_size * threads, > > 32, _state->push_const_offset); > > > > to use an alignment of 64 instead of 32 on Gen8+. That way, it'll > > actually upload the data to a portion of the buffer that starts on > > a 64B aligned boundary. > > > > As is, you're uploading the data to a 32B aligned section and then > > just fudging the pointer to be 64B aligned, possibly skipping over > > the first 32B. Probably not what you wanted :) > > > > Maybe you accidentally changed the wrong brw_state_batch call? > > > > >BEGIN_BATCH(4); > > >OUT_BATCH(MEDIA_CURBE_LOAD << 16 | (4 - 2)); > > >OUT_BATCH(0); > > >OUT_BATCH(reg_aligned_constant_size * threads); > > > - OUT_BATCH(stage_state->push_const_offset); > > > + OUT_BATCH(aligned_push_const_offset); > > >ADVANCE_BATCH(); > > > } > > > > > > @@ -241,7 +244,8 @@ brw_upload_cs_push_constants(struct brw_context *brw, > > > > > >const unsigned push_constant_data_size = > > > (local_id_dwords + prog_data->nr_params) * > > > sizeof(gl_constant_value); > > > -
Re: [Mesa-dev] [PATCH 6/7] nir: Teach nir_opt_algebraic about adding and subtracting the same thing
Hi, On 12/17/2015 01:52 AM, Matt Turner wrote: On Tue, Dec 15, 2015 at 1:16 AM, Eduardo Lima Mitevwrote: On 12/15/2015 09:28 AM, Kristian Høgsberg Kristensen wrote: This optimizes a + b - b to just a. Modest shader-db results (BDW): total instructions in shared programs: 7842452 -> 7841862 (-0.01%) instructions in affected programs: 61938 -> 61348 (-0.95%) total loops in shared programs:2131 -> 2131 (0.00%) helped:263 HURT: 0 GAINED:0 LOST: 0 In HSW, I get these shader-db results: total instructions in shared programs: 6257265 -> 6256788 (-0.01%) instructions in affected programs: 46601 -> 46124 (-1.02%) helped: 218 HURT: 0 total cycles in shared programs: 56010026 -> 56007760 (-0.00%) cycles in affected programs: 1048392 -> 1046126 (-0.22%) helped: 199 HURT: 154 total loops in shared programs: 1979 -> 1979 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 I wonder where those cycle HURTs come from. In any case, the net result is positive. I haven't confirmed, but I've seen cases that seem like the cycle counts are wrong. I have doubts about the correctness of latency values set in brw_schedule_instructions.cpp. They were added mostly by Eric on 2012 & 2013. You added mad & lrp data in 2013 and Curro untyped atomics & surface reads in 2013. Both of them have is_haswell check, but don't say anything about newer generations. It seems that some of the values are from spec and some from tests. However, for the test data, the code doesn't say on what exact HW and stepping the tests were run on. Or where the sources for those tests are so that one could try to reproduce the results, verify (with perf counters) that they actually are bound by what the test says, and update data gotten from them for newer generations (i.e. GEN8+). In addition to this, Mesa is lacking at least stall cycles for 3src register bank conflicts. - Eero PS. cycle values are anyway going to be off, code doesn't know memory latencies as that depends on locality & cache utilization, and it doesn't take threading into account. But it only tries to schedule things so that HW is able to better compensate latency, so it doesn't need to know how much cycles take, just have good enough estimate. :-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] Add .mailmap
On Thu, Dec 17, 2015 at 10:09 AM, Giuseppe Bilottawrote: > +# missing svn authors: I'm not sure this is useful, but just for kicks, I decided to try to track down these contributors, and this is what I came up with (suspected contributors CC'ed, so they can confirm or deny if they want): > +pesco Probably "Sven M. Hallberg " (Sources: http://sourceforge.net/p/mesa3d/mailman/message/5416179/ and https://github.com/jwiegley/lambdabot-1/blob/master/AUTHORS#L17) > +tanner Probably "Thomas Tanner " (Source: https://www.mail-archive.com/mesa-dev@mesa3d.org/msg00826.html) > +hmarson Probably "Hamish Marson " (Sources: http://sourceforge.net/p/r300/mailman/message/2172921/ and http://comments.gmane.org/gmane.comp.video.dri.devel/19600) > +reist Probably "Boris Peterbarg " (Sources: http://sourceforge.net/p/r300/mailman/r300-commit/?style=threaded=200502=16 and http://www.bugzilla.icculus.org/projects/beagle-avahi/Filters/entagged-sharp/Tracker/Util/TrackerTagReader.cs) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] Add .mailmap
On Thu, Dec 17, 2015 at 10:39 AM, Erik Faye-Lundwrote: > On Thu, Dec 17, 2015 at 10:09 AM, Giuseppe Bilotta > wrote: >> +# missing svn authors: > > I'm not sure this is useful, but just for kicks, I decided to try to > track down these contributors, and this is what I came up with > (suspected contributors CC'ed, so they can confirm or deny if they > want): > >> +pesco > > Probably "Sven M. Hallberg " > (Sources: http://sourceforge.net/p/mesa3d/mailman/message/5416179/ and > https://github.com/jwiegley/lambdabot-1/blob/master/AUTHORS#L17) OK, this address bounced (or rather, mailer-dae...@kundenserver.de buonced pe...@gmx.de, so I guess it's forwarded to a dead address) However, his homepage (http://khjk.org/~pesco/) suggests that "Sven M. Hallberg " might be an up-to-date e-mail address. CC'ed. >> +reist > > Probably "Boris Peterbarg " > (Sources: > http://sourceforge.net/p/r300/mailman/r300-commit/?style=threaded=200502=16 > and > http://www.bugzilla.icculus.org/projects/beagle-avahi/Filters/entagged-sharp/Tracker/Util/TrackerTagReader.cs) This one also bounced, but it seems he committed to Rails (https://github.com/rails/rails/pull/19351) as "Boris Peterbarg ". Again, CC'ed. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 80183] [llvmpipe] triangles with vertices that map to raster positions > viewport width/height are not displayed
https://bugs.freedesktop.org/show_bug.cgi?id=80183 --- Comment #19 from Jose Fonseca--- (In reply to cgerlach42 from comment #18) > I also tried to replay an apitrace generated with hardware rendering, but I > can't get the glretrace to use software rendering. It always uses the > systems opengl32.dll and therefore the clipping problem is not reproducible. If you put Mesa opengl32.dll into the same dir as glretrace.exe it should work. Alternatively, you can do set TRACE_LIBGL=C:\path\to\desired\opengl32.dll glretrace foo.trace -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/9] i965/gs/gen6: fix execsize for instructions with width of 4 in gen6_sol_program()
From: Samuel Iglesias GonsalvezSigned-off-by: Samuel Iglesias Gonsalvez --- src/mesa/drivers/dri/i965/brw_ff_gs_emit.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_ff_gs_emit.c b/src/mesa/drivers/dri/i965/brw_ff_gs_emit.c index 8589dab..3ac1c48 100644 --- a/src/mesa/drivers/dri/i965/brw_ff_gs_emit.c +++ b/src/mesa/drivers/dri/i965/brw_ff_gs_emit.c @@ -406,9 +406,11 @@ gen6_sol_program(struct brw_ff_gs_compile *c, struct brw_ff_gs_prog_key *key, : 0x00020001)); /* (1, 0, 2) */ brw_inst_set_pred_control(p->devinfo, inst, BRW_PREDICATE_NORMAL); } + brw_push_insn_state(p); + brw_set_default_exec_size(p, BRW_EXECUTE_4); brw_ADD(p, c->reg.destination_indices, c->reg.destination_indices, get_element_ud(c->reg.SVBI, 0)); - + brw_pop_insn_state(p); /* For each vertex, generate code to output each varying using the * appropriate binding table entry. */ @@ -438,8 +440,13 @@ gen6_sol_program(struct brw_ff_gs_compile *c, struct brw_ff_gs_prog_key *key, vertex_slot.swizzle = varying == VARYING_SLOT_PSIZ ? BRW_SWIZZLE_ : key->transform_feedback_swizzles[binding]; brw_set_default_access_mode(p, BRW_ALIGN_16); +brw_push_insn_state(p); +brw_set_default_exec_size(p, BRW_EXECUTE_4); + brw_MOV(p, stride(c->reg.header, 4, 4, 1), retype(vertex_slot, BRW_REGISTER_TYPE_UD)); +brw_pop_insn_state(p); + brw_set_default_access_mode(p, BRW_ALIGN_1); brw_svb_write(p, final_write ? c->reg.temp : brw_null_reg(), /* dest */ -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/9] i965/eu: set execution size for SEND message in brw_send_indirect_message
From: Iago Toral Quiroga--- src/mesa/drivers/dri/i965/brw_eu_emit.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 9a3b0a2..3d24010 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2562,6 +2562,9 @@ brw_send_indirect_message(struct brw_codegen *p, brw_set_src1(p, send, addr); } + if (dst.width < BRW_EXECUTE_8) + brw_inst_set_exec_size(devinfo, send, dst.width); + brw_set_dest(p, send, dst); brw_set_src0(p, send, retype(payload, BRW_REGISTER_TYPE_UD)); brw_inst_set_sfid(devinfo, send, sfid); -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/9] i965/eu: set correct execution size in brw_NOP
From: Iago Toral Quirogav2: NOP should have an execsize of 1 (Matt) --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 5fb9662..9a3b0a2 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1230,8 +1230,9 @@ brw_F16TO32(struct brw_codegen *p, struct brw_reg dst, struct brw_reg src) void brw_NOP(struct brw_codegen *p) { brw_inst *insn = next_insn(p, BRW_OPCODE_NOP); - brw_set_dest(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD)); - brw_set_src0(p, insn, retype(brw_vec4_grf(0,0), BRW_REGISTER_TYPE_UD)); + brw_inst_set_exec_size(p->devinfo, insn, BRW_EXECUTE_1); + brw_set_dest(p, insn, retype(brw_vec1_grf(0,0), BRW_REGISTER_TYPE_UD)); + brw_set_src0(p, insn, retype(brw_vec1_grf(0,0), BRW_REGISTER_TYPE_UD)); brw_set_src1(p, insn, brw_imm_ud(0x0)); } -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/9] i965/vec4/gen6: fix exec_size for instructions with destination width of 4
From: Samuel Iglesias GonsalvezSigned-off-by: Samuel Iglesias Gonsalvez --- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index e3f9eea..e2a203a 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -1092,6 +1092,7 @@ generate_code(struct brw_codegen *p, assert(inst->mlen <= BRW_MAX_MSG_LENGTH); unsigned pre_emit_nr_insn = p->nr_insn; + bool fix_exec_size = false; if (dst.width == BRW_WIDTH_4) { /* This happens in attribute fixups for "dual instanced" geometry @@ -1116,6 +1117,8 @@ generate_code(struct brw_codegen *p, if (src[i].file == BRW_GENERAL_REGISTER_FILE) src[i] = stride(src[i], 4, 4, 1); } + brw_set_default_exec_size(p, BRW_EXECUTE_4); + fix_exec_size = true; } switch (inst->opcode) { @@ -1545,6 +1548,9 @@ generate_code(struct brw_codegen *p, unreachable("Unsupported opcode"); } + if (fix_exec_size) + brw_set_default_exec_size(p, BRW_EXECUTE_8); + if (inst->opcode == VEC4_OPCODE_PACK_BYTES) { /* Handled dependency hints in the generator. */ -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/9] i965: set correct execsize for MOVS with a width of 4 in brw_find_live_channel
From: Iago Toral Quiroga--- src/mesa/drivers/dri/i965/brw_eu_emit.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 3d24010..88e2f71 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -3332,11 +3332,14 @@ brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst) /* Overwrite the destination without and with execution masking to * find out which of the channels is active. */ + brw_push_insn_state(p); + brw_set_default_exec_size(p, BRW_EXECUTE_4); brw_MOV(p, brw_writemask(vec4(dst), WRITEMASK_X), brw_imm_ud(1)); inst = brw_MOV(p, brw_writemask(vec4(dst), WRITEMASK_X), brw_imm_ud(0)); + brw_pop_insn_state(p); brw_inst_set_mask_control(devinfo, inst, BRW_MASK_ENABLE); } } -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 9/9] i965: Skip execution size adjustment for instructions of width 4
From: Iago Toral QuirogaThis code in brw_set_dest adjusts the execution size of any instruction with a dst.width < 8. However, we don't want to do this with instructions operating on doubles, since these will have a width of 4, but still need an execution size of 8 (for SIMD8). Unfortunately, we can't just check the size of the operands involved to detect if we are doing an operation on doubles, because we can have instructions that do operations on double operands interpreted as UD, operating on any of its 2 32-bit components. Previous commits have made it so we never emit instructions with a horizontal width of 4 that don't have the correct execution size set for gen6+, so we can skip it in this case, avoiding the conflicts with fp64 requirements. Expanding the same fix to other hardware generations requires many more changes but since we are not targetting fp64 support on them wer don't really care for now. --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 88e2f71..f03776d 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -202,8 +202,20 @@ brw_set_dest(struct brw_codegen *p, brw_inst *inst, struct brw_reg dest) /* Generators should set a default exec_size of either 8 (SIMD4x2 or SIMD8) * or 16 (SIMD16), as that's normally correct. However, when dealing with * small registers, we automatically reduce it to match the register size. +* +* In platforms that support fp64 we can emit instructions with a width of +* 4 that need two SIMD8 registers and an exec_size of 8 or 16. In these +* cases we need to make sure that these instructions have their exec sizes +* set properly when they are emitted and we can't rely on this code to fix +* it. */ - if (dest.width < BRW_EXECUTE_8) + bool fix_exec_size; + if (devinfo->gen >= 6) + fix_exec_size = dest.width < BRW_EXECUTE_4; + else + fix_exec_size = dest.width < BRW_EXECUTE_8; + + if (fix_exec_size) brw_inst_set_exec_size(devinfo, inst, dest.width); } -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/9] Skip automatic execsize for instructions with a width of 4
Hello, This patch series is a updated version of the one Iago sent last week [0] that includes patches for gen6 too, as suggested by Jason. We checked the gen9 code paths that work with a horizontal width of 4 and we think there won't be any regression on gen9... but we don't have any gen9 machine to run piglit with these patches. Can someone check it? Please read the original cover letter [0] for more information. Sam [0] http://lists.freedesktop.org/archives/mesa-dev/2015-December/102746.html Iago Toral Quiroga (5): i965/eu: set correct execution size in brw_NOP i965/fs: set execution size for SEND messages in generate_uniform_pull_constant_load_gen7 i965/eu: set execution size for SEND message in brw_send_indirect_message i965: set correct execsize for MOVS with a width of 4 in brw_find_live_channel i965: Skip execution size adjustment for instructions of width 4 Samuel Iglesias Gonsálvez (4): i965/gs/gen6: fix execsize for instructions with width of 4 in gen6_sol_program() i965/vec4/gen6: fix exec_size for instructions with width of 4 in generate_gs_svb_write() i965/vec4/gen6: fix exec_size for instructions with destination width of 4 i965/vec4/gen6: fix exec_size for MOV with a width of 4 in generate_gs_ff_sync() src/mesa/drivers/dri/i965/brw_eu_emit.c | 25 +--- src/mesa/drivers/dri/i965/brw_ff_gs_emit.c | 9 - src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 13 +++- 4 files changed, 44 insertions(+), 5 deletions(-) -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 8/9] i965/vec4/gen6: fix exec_size for MOV with a width of 4 in generate_gs_ff_sync()
From: Samuel Iglesias GonsalvezSigned-off-by: Samuel Iglesias Gonsalvez --- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index e2a203a..3a14685 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -698,8 +698,10 @@ generate_gs_ff_sync(struct brw_codegen *p, brw_MOV(p, get_element_ud(header, 0), get_element_ud(dst, 0)); /* src1 is not an immediate when we use transform feedback */ - if (src1.file != BRW_IMMEDIATE_VALUE) + if (src1.file != BRW_IMMEDIATE_VALUE) { + brw_set_default_exec_size(p, BRW_EXECUTE_4); brw_MOV(p, brw_vec4_grf(src1.nr, 0), brw_vec4_grf(dst.nr, 1)); + } brw_pop_insn_state(p); } -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/9] i965/vec4/gen6: fix exec_size for instructions with width of 4 in generate_gs_svb_write()
From: Samuel Iglesias GonsalvezSigned-off-by: Samuel Iglesias Gonsalvez --- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index c3426dd..e3f9eea 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -478,10 +478,13 @@ generate_gs_svb_write(struct brw_codegen *p, bool final_write = inst->sol_final_write; brw_push_insn_state(p); + brw_set_default_exec_size(p, BRW_EXECUTE_4); /* Copy Vertex data into M0.x */ brw_MOV(p, stride(dst, 4, 4, 1), stride(retype(src0, BRW_REGISTER_TYPE_UD), 4, 4, 1)); + brw_pop_insn_state(p); + brw_push_insn_state(p); /* Send SVB Write */ brw_svb_write(p, final_write ? src1 : brw_null_reg(), /* dest == src1 */ -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/9] i965/fs: set execution size for SEND messages in generate_uniform_pull_constant_load_gen7
From: Iago Toral Quiroga--- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index c25da07..42b5e86 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -1248,6 +1248,7 @@ fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst, brw_set_default_compression_control(p, BRW_COMPRESSION_NONE); brw_set_default_mask_control(p, BRW_MASK_DISABLE); brw_inst *send = brw_next_insn(p, BRW_OPCODE_SEND); + brw_inst_set_exec_size(devinfo, send, dst.width); brw_pop_insn_state(p); brw_set_dest(p, send, dst); @@ -1279,6 +1280,7 @@ fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst, /* dst = send(payload, a0.0 | ) */ brw_inst *insn = brw_send_indirect_message( p, BRW_SFID_SAMPLER, dst, src, addr); + brw_inst_set_exec_size(devinfo, insn, dst.width); brw_set_sampler_message(p, insn, 0, 0, /* LD message ignores sampler unit */ -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 1/2] mesa: Add a _mesa_active_fragment_shader_has_side_effects helper
Iago Toral Quirogawrites: > Some drivers can disable the FS unit if there is nothing in the shader code > that writes to an output (i.e. color, depth, etc). Right now, mesa has > a function to check for atomic buffers and the i965 driver also checks for > images. Refactor this logic into a generic function that we can use for > any source of side effects in a fragment shader. Sugested by Jason. > --- > src/mesa/drivers/dri/i965/gen7_wm_state.c | 6 +- > src/mesa/drivers/dri/i965/gen8_ps_state.c | 3 +-- > src/mesa/main/mtypes.h| 15 --- > 3 files changed, 14 insertions(+), 10 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c > b/src/mesa/drivers/dri/i965/gen7_wm_state.c > index 06d5e65..a6d1028 100644 > --- a/src/mesa/drivers/dri/i965/gen7_wm_state.c > +++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c > @@ -77,13 +77,9 @@ upload_wm_state(struct brw_context *brw) >dw1 |= GEN7_WM_KILL_ENABLE; > } > > - if (_mesa_active_fragment_shader_has_atomic_ops(>ctx)) { > - dw1 |= GEN7_WM_DISPATCH_ENABLE; > - } > - > /* _NEW_BUFFERS | _NEW_COLOR */ > if (brw_color_buffer_write_enabled(brw) || writes_depth || > - prog_data->base.nr_image_params || > + _mesa_active_fragment_shader_has_side_effects(>ctx) || > dw1 & GEN7_WM_KILL_ENABLE) { >dw1 |= GEN7_WM_DISPATCH_ENABLE; > } Hey, it looks like SSBOs are still missing a couple of things that could make their side effects rather non-deterministic on i965 hardware: On HSW you should probably set the UAV_ONLY WM state bit when there are no colour or depth buffer writes as is done for images below in this same function, and on all hardware you should set the early depth/stencil control field to PSEXEC unless early fragment tests are enabled to make sure that the fragment shader is executed regardless of whether per-fragment tests pass or not as the spec requires. > diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c > b/src/mesa/drivers/dri/i965/gen8_ps_state.c > index 945f710..3cc8c68 100644 > --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c > +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c > @@ -90,8 +90,7 @@ gen8_upload_ps_extra(struct brw_context *brw, > * > * BRW_NEW_FS_PROG_DATA | BRW_NEW_FRAGMENT_PROGRAM | _NEW_BUFFERS | > _NEW_COLOR > */ > - if ((_mesa_active_fragment_shader_has_atomic_ops(>ctx) || > -prog_data->base.nr_image_params) && > + if (_mesa_active_fragment_shader_has_side_effects(>ctx) && > !brw_color_buffer_write_enabled(brw)) >dw1 |= GEN8_PSX_SHADER_HAS_UAV; > > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h > index 191a9ea..834ba59 100644 > --- a/src/mesa/main/mtypes.h > +++ b/src/mesa/main/mtypes.h > @@ -4538,11 +4538,20 @@ enum _debug > DEBUG_INCOMPLETE_FBO = (1 << 3) > }; > > +/** > + * Checks if the active fragment shader program can have side effects due > + * to use of things like atomic buffers or images > + */ > static inline bool > -_mesa_active_fragment_shader_has_atomic_ops(const struct gl_context *ctx) > +_mesa_active_fragment_shader_has_side_effects(const struct gl_context *ctx) > { > - return ctx->Shader._CurrentFragmentProgram != NULL && > - > ctx->Shader._CurrentFragmentProgram->_LinkedShaders[MESA_SHADER_FRAGMENT]->NumAtomicBuffers > > 0; > + const struct gl_shader *sh; > + > + if (!ctx->Shader._CurrentFragmentProgram) > + return false; > + > + sh = > ctx->Shader._CurrentFragmentProgram->_LinkedShaders[MESA_SHADER_FRAGMENT]; > + return sh->NumAtomicBuffers > 0 || sh->NumImages > 0; > } > > #ifdef __cplusplus > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/9] i965/eu: set correct execution size in brw_NOP
Reviewed-by: Matt Turner___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4
On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quirogawrote: > Right now we rely on the code at the bottom of brw_set_dest to set the > correct execution size for anything that does not operate on a full SIMD > register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, > where operands are twice as big and we see instructions with a horizontal > width of 4 that still require an execution size of 8. We cannot fix this by > simply checking the type of the operands involved and skip the automatic > execsize adjustment when they are doubles because we can also operate on > doubles as integers (for pack and unpack operations for example). Can you give an example of when checking the type wouldn't be sufficient? Presumably packDouble2x32/unpackDouble2x32? What code do they generate? Could we look at the destination's stride as well? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: fix pstipple and aaline stages wrt sampler_views/samplers
On 17/12/15 04:59, srol...@vmware.com wrote: From: Roland ScheideggerThose stages only really work for OGL-style texturing (so number of samplers and views mostly the same, certainly for the max values). These get often set up all at once, thus there might be max number of both even if all of them are just NULL. We must not set the max number of samplers and views to the same value since that will lead to terrible things if a driver supports more views than samplers (and the state tracker set up all the views). (This will not make these stages magically work if a shader uses dx10-style texturing, they might still replace an actually used sview in that case.) --- src/gallium/auxiliary/draw/draw_pipe_aaline.c | 9 + src/gallium/auxiliary/draw/draw_pipe_pstipple.c | 7 --- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_aaline.c b/src/gallium/auxiliary/draw/draw_pipe_aaline.c index 85d24b7..85ae84c 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_aaline.c +++ b/src/gallium/auxiliary/draw/draw_pipe_aaline.c @@ -646,6 +646,7 @@ aaline_first_line(struct draw_stage *stage, struct prim_header *header) struct pipe_context *pipe = draw->pipe; const struct pipe_rasterizer_state *rast = draw->rasterizer; uint num_samplers; + uint num_sampler_views; void *r; assert(draw->rasterizer->line_smooth); @@ -667,9 +668,9 @@ aaline_first_line(struct draw_stage *stage, struct prim_header *header) draw_aaline_prepare_outputs(draw, draw->pipeline.aaline); /* how many samplers? */ - /* we'll use sampler/texture[pstip->sampler_unit] for the stipple */ - num_samplers = MAX2(aaline->num_sampler_views, aaline->num_samplers); - num_samplers = MAX2(num_samplers, aaline->fs->sampler_unit + 1); + /* we'll use sampler/texture[aaline->sampler_unit] for the alpha texture */ + num_samplers = MAX2(aaline->num_samplers, aaline->fs->sampler_unit + 1); + num_sampler_views = MAX2(num_samplers, aaline->num_sampler_views); aaline->state.sampler[aaline->fs->sampler_unit] = aaline->sampler_cso; pipe_sampler_view_reference(>state.sampler_views[aaline->fs->sampler_unit], @@ -681,7 +682,7 @@ aaline_first_line(struct draw_stage *stage, struct prim_header *header) num_samplers, aaline->state.sampler); aaline->driver_set_sampler_views(pipe, PIPE_SHADER_FRAGMENT, 0, -num_samplers, aaline->state.sampler_views); +num_sampler_views, aaline->state.sampler_views); /* Disable triangle culling, stippling, unfilled mode etc. */ r = draw_get_rasterizer_no_cull(draw, rast->scissor, rast->flatshade); diff --git a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c index a51e91f..3bfd414 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c +++ b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c @@ -477,6 +477,7 @@ pstip_first_tri(struct draw_stage *stage, struct prim_header *header) struct pipe_context *pipe = pstip->pipe; struct draw_context *draw = stage->draw; uint num_samplers; + uint num_sampler_views; assert(stage->draw->rasterizer->poly_stipple_enable); @@ -490,8 +491,8 @@ pstip_first_tri(struct draw_stage *stage, struct prim_header *header) /* how many samplers? */ /* we'll use sampler/texture[pstip->sampler_unit] for the stipple */ - num_samplers = MAX2(pstip->num_sampler_views, pstip->num_samplers); - num_samplers = MAX2(num_samplers, pstip->fs->sampler_unit + 1); + num_samplers = MAX2(pstip->num_samplers, pstip->fs->sampler_unit + 1); + num_sampler_views = MAX2(pstip->num_sampler_views, num_samplers); /* plug in our sampler, texture */ pstip->state.samplers[pstip->fs->sampler_unit] = pstip->sampler_cso; @@ -506,7 +507,7 @@ pstip_first_tri(struct draw_stage *stage, struct prim_header *header) num_samplers, pstip->state.samplers); pstip->driver_set_sampler_views(pipe, PIPE_SHADER_FRAGMENT, 0, - num_samplers, pstip->state.sampler_views); + num_sampler_views, pstip->state.sampler_views); draw->suspend_flushing = FALSE; Reviewed-by: Jose Fonseca ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 80183] [llvmpipe] triangles with vertices that map to raster positions > viewport width/height are not displayed
https://bugs.freedesktop.org/show_bug.cgi?id=80183 Roland Scheideggerchanged: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #20 from Roland Scheidegger --- Ok let's close it then. FWIW I suspect 9e3f2af3c3732bd618308ddeffb017966a4fc93e fixed it (this applies if you were writing gl_ClipVertex and have enabled at least one clip plane). -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] Add .mailmap
On Thu, Dec 17, 2015 at 1:09 AM, Giuseppe Bilottawrote: > This adds a first tentative .mailmap file, to canonicize contributor > name/emails in shortlogs and other statistical endeavours. > > Signed-off-by: Giuseppe Bilotta > --- If we want this kind of information, should we just go with Ken's gitdm branch? That'll provide information about per-company contribution stats as well. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: fix pstipple and aaline stages wrt sampler_views/samplers
On 12/16/2015 09:59 PM, srol...@vmware.com wrote: From: Roland ScheideggerThose stages only really work for OGL-style texturing (so number of samplers and views mostly the same, certainly for the max values). These get often set up all at once, thus there might be max number of both even if all of them are just NULL. We must not set the max number of samplers and views to the same value since that will lead to terrible things if a driver supports more views than samplers (and the state tracker set up all the views). (This will not make these stages magically work if a shader uses dx10-style texturing, they might still replace an actually used sview in that case.) --- src/gallium/auxiliary/draw/draw_pipe_aaline.c | 9 + src/gallium/auxiliary/draw/draw_pipe_pstipple.c | 7 --- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_aaline.c b/src/gallium/auxiliary/draw/draw_pipe_aaline.c index 85d24b7..85ae84c 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_aaline.c +++ b/src/gallium/auxiliary/draw/draw_pipe_aaline.c @@ -646,6 +646,7 @@ aaline_first_line(struct draw_stage *stage, struct prim_header *header) struct pipe_context *pipe = draw->pipe; const struct pipe_rasterizer_state *rast = draw->rasterizer; uint num_samplers; + uint num_sampler_views; void *r; assert(draw->rasterizer->line_smooth); @@ -667,9 +668,9 @@ aaline_first_line(struct draw_stage *stage, struct prim_header *header) draw_aaline_prepare_outputs(draw, draw->pipeline.aaline); /* how many samplers? */ - /* we'll use sampler/texture[pstip->sampler_unit] for the stipple */ - num_samplers = MAX2(aaline->num_sampler_views, aaline->num_samplers); - num_samplers = MAX2(num_samplers, aaline->fs->sampler_unit + 1); + /* we'll use sampler/texture[aaline->sampler_unit] for the alpha texture */ + num_samplers = MAX2(aaline->num_samplers, aaline->fs->sampler_unit + 1); + num_sampler_views = MAX2(num_samplers, aaline->num_sampler_views); aaline->state.sampler[aaline->fs->sampler_unit] = aaline->sampler_cso; pipe_sampler_view_reference(>state.sampler_views[aaline->fs->sampler_unit], @@ -681,7 +682,7 @@ aaline_first_line(struct draw_stage *stage, struct prim_header *header) num_samplers, aaline->state.sampler); aaline->driver_set_sampler_views(pipe, PIPE_SHADER_FRAGMENT, 0, -num_samplers, aaline->state.sampler_views); +num_sampler_views, aaline->state.sampler_views); /* Disable triangle culling, stippling, unfilled mode etc. */ r = draw_get_rasterizer_no_cull(draw, rast->scissor, rast->flatshade); diff --git a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c index a51e91f..3bfd414 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c +++ b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c @@ -477,6 +477,7 @@ pstip_first_tri(struct draw_stage *stage, struct prim_header *header) struct pipe_context *pipe = pstip->pipe; struct draw_context *draw = stage->draw; uint num_samplers; + uint num_sampler_views; assert(stage->draw->rasterizer->poly_stipple_enable); @@ -490,8 +491,8 @@ pstip_first_tri(struct draw_stage *stage, struct prim_header *header) /* how many samplers? */ /* we'll use sampler/texture[pstip->sampler_unit] for the stipple */ - num_samplers = MAX2(pstip->num_sampler_views, pstip->num_samplers); - num_samplers = MAX2(num_samplers, pstip->fs->sampler_unit + 1); + num_samplers = MAX2(pstip->num_samplers, pstip->fs->sampler_unit + 1); + num_sampler_views = MAX2(pstip->num_sampler_views, num_samplers); /* plug in our sampler, texture */ pstip->state.samplers[pstip->fs->sampler_unit] = pstip->sampler_cso; @@ -506,7 +507,7 @@ pstip_first_tri(struct draw_stage *stage, struct prim_header *header) num_samplers, pstip->state.samplers); pstip->driver_set_sampler_views(pipe, PIPE_SHADER_FRAGMENT, 0, - num_samplers, pstip->state.sampler_views); + num_sampler_views, pstip->state.sampler_views); draw->suspend_flushing = FALSE; Reviewed-by: Brian Paul ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4
On Thu, Dec 17, 2015 at 12:31 PM, Connor Abbottwrote: > On Thu, Dec 17, 2015 at 11:44 AM, Matt Turner wrote: >> On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quiroga wrote: >>> Right now we rely on the code at the bottom of brw_set_dest to set the >>> correct execution size for anything that does not operate on a full SIMD >>> register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, >>> where operands are twice as big and we see instructions with a horizontal >>> width of 4 that still require an execution size of 8. We cannot fix this by >>> simply checking the type of the operands involved and skip the automatic >>> execsize adjustment when they are doubles because we can also operate on >>> doubles as integers (for pack and unpack operations for example). >> >> Can you give an example of when checking the type wouldn't be >> sufficient? Presumably packDouble2x32/unpackDouble2x32? What code do >> they generate? Could we look at the destination's stride as well? > > Yes, packDouble2x32 is what would cause the same issue, since it turns > into two 32-bit MOV's with a destination stride of 2 (and offset of 0 > and 1, ofc). We could check for the stride, but if we really don't > want to change this code, the better way to go would be to avoid the > width adjustment that causes this whole thing to happen for > destination registers, since everything other than this piece of code > ignores the destination width and vstride. But that being said, this > code is a hack since it breaks the assumption that fs_inst::exec_size > always equals the final ExecSize of the instruction, and Iago and I > agreed to fix it (or at least for now, do the portion of the fix > needed for fp64) rather than work around it. Not sure if you guys are thinking about it, but a lot of these things that apply to doubles also apply to ARB_gpu_shader_int64. While you obviously don't need to support it now, you may want to keep it in mind as you make design decisions. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/15] i965/fs: Get rid of reladdr
Jason Ekstrandwrites: > On Dec 14, 2015 6:24 AM, "Francisco Jerez" wrote: >> >> Jason Ekstrand writes: >> >> > On Dec 11, 2015 5:44 AM, "Francisco Jerez" > wrote: >> >> >> >> Jason Ekstrand writes: >> >> >> >> > On Dec 10, 2015 6:58 AM, "Francisco Jerez" >> > wrote: >> >> >> >> >> >> Jason Ekstrand writes: >> >> >> >> >> >> > We aren't using it anymore. >> >> >> >> >> >> It seems useful to me to be able to represent indirect access as > part >> > of >> >> >> any instruction source or destination register. >> >> >> >> >> >> The following: >> >> >> >> >> >> | mov_indirect g0, g1, a0 >> >> >> | foo g2, g0 >> >> >> >> >> >> and the converse case with indirect destination offset (which you > don't >> >> >> seem to represent currently) can be implemented by the hardware more >> >> >> efficiently using a single instruction in certain cases. The > current >> > IR >> >> >> is able to represent what the hardware can do, but supporting the >> >> >> MOV_INDIRECT instruction only would force us to keep the indirection >> >> >> separate from the instruction that uses it, so it seems like a less >> >> >> expressive representation to me than the current approach, unless >> > you're >> >> >> willing to add _INDIRECT variants of most hardware opcodes. >> >> > >> >> > Yes and, mostly, no. Yes, you can put an indirect on almost anything >> > but >> >> > it has substantial restrictions: >> >> > >> >> Yes, I'm aware of the restrictions of indirect addressing on Gen >> >> hardware. The fact that reladdr can represent addressing modes which >> >> aren't directly implemented in the hardware doesn't invalidate the >> >> abstraction, nor implies that optimization and translation passes must >> >> be made aware of such restrictions. It just means that our abstraction >> >> is a superset of the hardware indirect addressing scheme, which is fine >> >> given a lowering pass to convert indirect addressing into a legal form. >> >> >> >> Your proposal instead goes the opposite direction and replaces the >> >> preexisting abstraction with a new abstraction which can only represent >> >> a tiny subset of the functionality of hardware instructions, which I >> >> don't think is acceptable for a low-level IR. >> >> >> >> > 1) Destination indirects must be uniform (I'm only 95% sure this is > the >> >> > case) >> >> > >> >> > 2) We only have 8 address subregisters on HSW and prior and 16 on BDW >> > which >> >> > means: >> >> > >> >> >> >> That's the kind of thing that SIMD lowering would be able to take care >> >> of easily. >> > >> > Yes, and we use it for MOV_INDIRECT. >> > >> Right, and there's no reason why it couldn't be used for reladdr in the >> same way to handle the above restrictions. >> >> >> > a) All indirects must be uniform OR >> >> > >> >> > b) We must be on Broadwell, have only two indirects, and split > the >> >> > instruction OR >> >> > >> >> > c) We can only have one indirect (and still split the > instruction on >> >> > HSW) >> >> > >> >> > So, yes, "everthing can be uniform", but not in real life. >> >> >> >> > The reladdr abstraction, while maybe ok for a high-level IR, doesn't >> >> > accurately represent the hardware at all because it can't represent >> >> > any of the restrictions without piles of helper functions and checks. >> >> >> >> The reladdr abstraction can represent the full flexibility of the >> >> hardware, while MOV_INDIRECT cannot, that makes reladdr a more accurate >> >> low-level representation of the program. For that reason I'd argue the >> >> exact opposite: MOV_INDIRECT would be a great abstraction and a welcome >> >> simplification for a high-level IR (e.g. NIR), in which nothing is lost >> >> by sacrificing the expressiveness of individual instructions in favour >> >> of instruction composition, because the back-end can always combine >> >> multiple high-level instructions into a single low-level instruction >> >> where the hardware ISA allows, as long as the low-level IR is able to >> >> represent the full functionality of hardware instructions (IOW the >> >> mapping between low-level IR instructions and hardware instructions is >> >> surjective), which is further from being the case after this series. >> > >> > Flexibility and expressiveness isn't always a good thing. >> >> Expressiveness is a good thing as long as it's required to represent the >> capabilities of the hardware. > > As long as it's required to represent those capabilities that we care > about, yes. However, the hardware can do many things that either aren't > useful for us or take so much effort for the little benefit you gain, that > it's not worth it. I'm arguing that saving a single MOV in the few places > that we do in directs isn't worth the complexity. > >> > Yes, reladdr is more "expressive", but it's not getting used because >> > that
Re: [Mesa-dev] [RFC PATCH 0/5] Skip automatic execsize for instructions with a width of 4
On Thu, Dec 17, 2015 at 11:44 AM, Matt Turnerwrote: > On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quiroga wrote: >> Right now we rely on the code at the bottom of brw_set_dest to set the >> correct execution size for anything that does not operate on a full SIMD >> register (dst.width < BRW_EXECUTE_8). However, this won't work with fp64, >> where operands are twice as big and we see instructions with a horizontal >> width of 4 that still require an execution size of 8. We cannot fix this by >> simply checking the type of the operands involved and skip the automatic >> execsize adjustment when they are doubles because we can also operate on >> doubles as integers (for pack and unpack operations for example). > > Can you give an example of when checking the type wouldn't be > sufficient? Presumably packDouble2x32/unpackDouble2x32? What code do > they generate? Could we look at the destination's stride as well? Yes, packDouble2x32 is what would cause the same issue, since it turns into two 32-bit MOV's with a destination stride of 2 (and offset of 0 and 1, ofc). We could check for the stride, but if we really don't want to change this code, the better way to go would be to avoid the width adjustment that causes this whole thing to happen for destination registers, since everything other than this piece of code ignores the destination width and vstride. But that being said, this code is a hack since it breaks the assumption that fs_inst::exec_size always equals the final ExecSize of the instruction, and Iago and I agreed to fix it (or at least for now, do the portion of the fix needed for fp64) rather than work around it. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: fix clip test with NaNs
On 12/17/2015 09:58 AM, srol...@vmware.com wrote: From: Roland ScheideggerNaNs mean it should be clipped, otherwise the NaNs might get passed to the next stages (if clipping didn't happen for another reason already), which might cause all kind of problems. The llvm path got this right already (possibly by luck), but this isn't used when there's a gs active. Found by code inspection, verified with some hacked piglit test and some more hacked debug output. (Note the clipper can still itself incorrectly generate NaN and INF position values in its output prims (at least after w divide / viewport transform) even if the inputs weren't NaNs, if the position data of the vertices is "sufficiently bad".) --- src/gallium/auxiliary/draw/draw_cliptest_tmp.h | 28 +- src/gallium/auxiliary/draw/draw_llvm.c | 4 2 files changed, 18 insertions(+), 14 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h index 779b237..d8866cd 100644 --- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h +++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h @@ -95,30 +95,31 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs, out->pre_clip_pos[i] = position[i]; } + /* Be careful with NaNs. Comparisons must be true for them. */ /* Do the hardwired planes first: */ if (flags & DO_CLIP_XY_GUARD_BAND) { -if (-0.50 * position[0] + position[3] < 0) mask |= (1<<0); -if ( 0.50 * position[0] + position[3] < 0) mask |= (1<<1); -if (-0.50 * position[1] + position[3] < 0) mask |= (1<<2); -if ( 0.50 * position[1] + position[3] < 0) mask |= (1<<3); +if (!(-0.50 * position[0] + position[3] >= 0)) mask |= (1<<0); +if (!( 0.50 * position[0] + position[3] >= 0)) mask |= (1<<1); +if (!(-0.50 * position[1] + position[3] >= 0)) mask |= (1<<2); +if (!( 0.50 * position[1] + position[3] >= 0)) mask |= (1<<3); } else if (flags & DO_CLIP_XY) { -if (-position[0] + position[3] < 0) mask |= (1<<0); -if ( position[0] + position[3] < 0) mask |= (1<<1); -if (-position[1] + position[3] < 0) mask |= (1<<2); -if ( position[1] + position[3] < 0) mask |= (1<<3); +if (!(-position[0] + position[3] >= 0)) mask |= (1<<0); +if (!( position[0] + position[3] >= 0)) mask |= (1<<1); +if (!(-position[1] + position[3] >= 0)) mask |= (1<<2); +if (!( position[1] + position[3] >= 0)) mask |= (1<<3); } /* Clip Z planes according to full cube, half cube or none. */ if (flags & DO_CLIP_FULL_Z) { -if ( position[2] + position[3] < 0) mask |= (1<<4); -if (-position[2] + position[3] < 0) mask |= (1<<5); +if (!( position[2] + position[3] >= 0)) mask |= (1<<4); +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5); } else if (flags & DO_CLIP_HALF_Z) { -if ( position[2] < 0) mask |= (1<<4); -if (-position[2] + position[3] < 0) mask |= (1<<5); +if (!( position[2] >= 0)) mask |= (1<<4); +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5); } if (flags & DO_CLIP_USER) { @@ -146,7 +147,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs, if (clipdist < 0 || util_is_inf_or_nan(clipdist)) mask |= 1 << plane_idx; } else { - if (dot4(clipvertex, plane[plane_idx]) < 0) + if (!(dot4(clipvertex, plane[plane_idx]) >= 0)) mask |= 1 << plane_idx; } } @@ -192,7 +193,6 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs, out = (struct vertex_header *)( (char *)out + info->stride ); } - return need_pipeline != 0; } diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 8435991..dad523a 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -1200,6 +1200,10 @@ generate_clipmask(struct draw_llvm *llvm, cv_w = pos_w; } + /* +* Be careful with the comparisons and NaNs (using llvm's unordered +* comparisons here). +*/ /* Cliptest, for hardwired planes */ if (clip_xy) { /* plane 1 */ Looks OK to me, but I'm not sure I understand what'll happen when we find a vertex with a NaN coordinate. Does the whole triangle get culled? Otherwise, what would be the result of computing the intersection point on the clip plane? Reviewed-by: Brian Paul ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
[Mesa-dev] [PATCH] draw: fix clip test with NaNs
From: Roland ScheideggerNaNs mean it should be clipped, otherwise the NaNs might get passed to the next stages (if clipping didn't happen for another reason already), which might cause all kind of problems. The llvm path got this right already (possibly by luck), but this isn't used when there's a gs active. Found by code inspection, verified with some hacked piglit test and some more hacked debug output. (Note the clipper can still itself incorrectly generate NaN and INF position values in its output prims (at least after w divide / viewport transform) even if the inputs weren't NaNs, if the position data of the vertices is "sufficiently bad".) --- src/gallium/auxiliary/draw/draw_cliptest_tmp.h | 28 +- src/gallium/auxiliary/draw/draw_llvm.c | 4 2 files changed, 18 insertions(+), 14 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h index 779b237..d8866cd 100644 --- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h +++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h @@ -95,30 +95,31 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs, out->pre_clip_pos[i] = position[i]; } + /* Be careful with NaNs. Comparisons must be true for them. */ /* Do the hardwired planes first: */ if (flags & DO_CLIP_XY_GUARD_BAND) { -if (-0.50 * position[0] + position[3] < 0) mask |= (1<<0); -if ( 0.50 * position[0] + position[3] < 0) mask |= (1<<1); -if (-0.50 * position[1] + position[3] < 0) mask |= (1<<2); -if ( 0.50 * position[1] + position[3] < 0) mask |= (1<<3); +if (!(-0.50 * position[0] + position[3] >= 0)) mask |= (1<<0); +if (!( 0.50 * position[0] + position[3] >= 0)) mask |= (1<<1); +if (!(-0.50 * position[1] + position[3] >= 0)) mask |= (1<<2); +if (!( 0.50 * position[1] + position[3] >= 0)) mask |= (1<<3); } else if (flags & DO_CLIP_XY) { -if (-position[0] + position[3] < 0) mask |= (1<<0); -if ( position[0] + position[3] < 0) mask |= (1<<1); -if (-position[1] + position[3] < 0) mask |= (1<<2); -if ( position[1] + position[3] < 0) mask |= (1<<3); +if (!(-position[0] + position[3] >= 0)) mask |= (1<<0); +if (!( position[0] + position[3] >= 0)) mask |= (1<<1); +if (!(-position[1] + position[3] >= 0)) mask |= (1<<2); +if (!( position[1] + position[3] >= 0)) mask |= (1<<3); } /* Clip Z planes according to full cube, half cube or none. */ if (flags & DO_CLIP_FULL_Z) { -if ( position[2] + position[3] < 0) mask |= (1<<4); -if (-position[2] + position[3] < 0) mask |= (1<<5); +if (!( position[2] + position[3] >= 0)) mask |= (1<<4); +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5); } else if (flags & DO_CLIP_HALF_Z) { -if ( position[2] < 0) mask |= (1<<4); -if (-position[2] + position[3] < 0) mask |= (1<<5); +if (!( position[2] >= 0)) mask |= (1<<4); +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5); } if (flags & DO_CLIP_USER) { @@ -146,7 +147,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs, if (clipdist < 0 || util_is_inf_or_nan(clipdist)) mask |= 1 << plane_idx; } else { - if (dot4(clipvertex, plane[plane_idx]) < 0) + if (!(dot4(clipvertex, plane[plane_idx]) >= 0)) mask |= 1 << plane_idx; } } @@ -192,7 +193,6 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs, out = (struct vertex_header *)( (char *)out + info->stride ); } - return need_pipeline != 0; } diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 8435991..dad523a 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -1200,6 +1200,10 @@ generate_clipmask(struct draw_llvm *llvm, cv_w = pos_w; } + /* +* Be careful with the comparisons and NaNs (using llvm's unordered +* comparisons here). +*/ /* Cliptest, for hardwired planes */ if (clip_xy) { /* plane 1 */ -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/15] i965/fs: Get rid of reladdr
On Dec 17, 2015 9:10 AM, "Francisco Jerez"wrote: > > Jason Ekstrand writes: > > > On Dec 14, 2015 6:24 AM, "Francisco Jerez" wrote: > >> > >> Jason Ekstrand writes: > >> > >> > On Dec 11, 2015 5:44 AM, "Francisco Jerez" > > wrote: > >> >> > >> >> Jason Ekstrand writes: > >> >> > >> >> > On Dec 10, 2015 6:58 AM, "Francisco Jerez" > >> > wrote: > >> >> >> > >> >> >> Jason Ekstrand writes: > >> >> >> > >> >> >> > We aren't using it anymore. > >> >> >> > >> >> >> It seems useful to me to be able to represent indirect access as > > part > >> > of > >> >> >> any instruction source or destination register. > >> >> >> > >> >> >> The following: > >> >> >> > >> >> >> | mov_indirect g0, g1, a0 > >> >> >> | foo g2, g0 > >> >> >> > >> >> >> and the converse case with indirect destination offset (which you > > don't > >> >> >> seem to represent currently) can be implemented by the hardware more > >> >> >> efficiently using a single instruction in certain cases. The > > current > >> > IR > >> >> >> is able to represent what the hardware can do, but supporting the > >> >> >> MOV_INDIRECT instruction only would force us to keep the indirection > >> >> >> separate from the instruction that uses it, so it seems like a less > >> >> >> expressive representation to me than the current approach, unless > >> > you're > >> >> >> willing to add _INDIRECT variants of most hardware opcodes. > >> >> > > >> >> > Yes and, mostly, no. Yes, you can put an indirect on almost anything > >> > but > >> >> > it has substantial restrictions: > >> >> > > >> >> Yes, I'm aware of the restrictions of indirect addressing on Gen > >> >> hardware. The fact that reladdr can represent addressing modes which > >> >> aren't directly implemented in the hardware doesn't invalidate the > >> >> abstraction, nor implies that optimization and translation passes must > >> >> be made aware of such restrictions. It just means that our abstraction > >> >> is a superset of the hardware indirect addressing scheme, which is fine > >> >> given a lowering pass to convert indirect addressing into a legal form. > >> >> > >> >> Your proposal instead goes the opposite direction and replaces the > >> >> preexisting abstraction with a new abstraction which can only represent > >> >> a tiny subset of the functionality of hardware instructions, which I > >> >> don't think is acceptable for a low-level IR. > >> >> > >> >> > 1) Destination indirects must be uniform (I'm only 95% sure this is > > the > >> >> > case) > >> >> > > >> >> > 2) We only have 8 address subregisters on HSW and prior and 16 on BDW > >> > which > >> >> > means: > >> >> > > >> >> > >> >> That's the kind of thing that SIMD lowering would be able to take care > >> >> of easily. > >> > > >> > Yes, and we use it for MOV_INDIRECT. > >> > > >> Right, and there's no reason why it couldn't be used for reladdr in the > >> same way to handle the above restrictions. > >> > >> >> > a) All indirects must be uniform OR > >> >> > > >> >> > b) We must be on Broadwell, have only two indirects, and split > > the > >> >> > instruction OR > >> >> > > >> >> > c) We can only have one indirect (and still split the > > instruction on > >> >> > HSW) > >> >> > > >> >> > So, yes, "everthing can be uniform", but not in real life. > >> >> > >> >> > The reladdr abstraction, while maybe ok for a high-level IR, doesn't > >> >> > accurately represent the hardware at all because it can't represent > >> >> > any of the restrictions without piles of helper functions and checks. > >> >> > >> >> The reladdr abstraction can represent the full flexibility of the > >> >> hardware, while MOV_INDIRECT cannot, that makes reladdr a more accurate > >> >> low-level representation of the program. For that reason I'd argue the > >> >> exact opposite: MOV_INDIRECT would be a great abstraction and a welcome > >> >> simplification for a high-level IR (e.g. NIR), in which nothing is lost > >> >> by sacrificing the expressiveness of individual instructions in favour > >> >> of instruction composition, because the back-end can always combine > >> >> multiple high-level instructions into a single low-level instruction > >> >> where the hardware ISA allows, as long as the low-level IR is able to > >> >> represent the full functionality of hardware instructions (IOW the > >> >> mapping between low-level IR instructions and hardware instructions is > >> >> surjective), which is further from being the case after this series. > >> > > >> > Flexibility and expressiveness isn't always a good thing. > >> > >> Expressiveness is a good thing as long as it's required to represent the > >> capabilities of the hardware. > > > > As long as it's required to represent those capabilities that we care > > about, yes. However, the hardware can do many things that either aren't >
Re: [Mesa-dev] [PATCH] draw: fix clip test with NaNs
Am 17.12.2015 um 19:13 schrieb Brian Paul: > On 12/17/2015 09:58 AM, srol...@vmware.com wrote: >> From: Roland Scheidegger>> >> NaNs mean it should be clipped, otherwise the NaNs might get passed to >> the >> next stages (if clipping didn't happen for another reason already), which >> might cause all kind of problems. >> The llvm path got this right already (possibly by luck), but this >> isn't used >> when there's a gs active. >> Found by code inspection, verified with some hacked piglit test and >> some more >> hacked debug output. >> (Note the clipper can still itself incorrectly generate NaN and INF >> position >> values in its output prims (at least after w divide / viewport >> transform) even >> if the inputs weren't NaNs, if the position data of the vertices is >> "sufficiently bad".) >> --- >> src/gallium/auxiliary/draw/draw_cliptest_tmp.h | 28 >> +- >> src/gallium/auxiliary/draw/draw_llvm.c | 4 >> 2 files changed, 18 insertions(+), 14 deletions(-) >> >> diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h >> b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h >> index 779b237..d8866cd 100644 >> --- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h >> +++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h >> @@ -95,30 +95,31 @@ static boolean TAG(do_cliptest)( struct pt_post_vs >> *pvs, >> out->pre_clip_pos[i] = position[i]; >>} >> >> + /* Be careful with NaNs. Comparisons must be true for them. */ >>/* Do the hardwired planes first: >> */ >>if (flags & DO_CLIP_XY_GUARD_BAND) { >> -if (-0.50 * position[0] + position[3] < 0) mask |= (1<<0); >> -if ( 0.50 * position[0] + position[3] < 0) mask |= (1<<1); >> -if (-0.50 * position[1] + position[3] < 0) mask |= (1<<2); >> -if ( 0.50 * position[1] + position[3] < 0) mask |= (1<<3); >> +if (!(-0.50 * position[0] + position[3] >= 0)) mask |= >> (1<<0); >> +if (!( 0.50 * position[0] + position[3] >= 0)) mask |= >> (1<<1); >> +if (!(-0.50 * position[1] + position[3] >= 0)) mask |= >> (1<<2); >> +if (!( 0.50 * position[1] + position[3] >= 0)) mask |= >> (1<<3); >>} >>else if (flags & DO_CLIP_XY) { >> -if (-position[0] + position[3] < 0) mask |= (1<<0); >> -if ( position[0] + position[3] < 0) mask |= (1<<1); >> -if (-position[1] + position[3] < 0) mask |= (1<<2); >> -if ( position[1] + position[3] < 0) mask |= (1<<3); >> +if (!(-position[0] + position[3] >= 0)) mask |= (1<<0); >> +if (!( position[0] + position[3] >= 0)) mask |= (1<<1); >> +if (!(-position[1] + position[3] >= 0)) mask |= (1<<2); >> +if (!( position[1] + position[3] >= 0)) mask |= (1<<3); >>} >> >>/* Clip Z planes according to full cube, half cube or none. >> */ >>if (flags & DO_CLIP_FULL_Z) { >> -if ( position[2] + position[3] < 0) mask |= (1<<4); >> -if (-position[2] + position[3] < 0) mask |= (1<<5); >> +if (!( position[2] + position[3] >= 0)) mask |= (1<<4); >> +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5); >>} >>else if (flags & DO_CLIP_HALF_Z) { >> -if ( position[2] < 0) mask |= (1<<4); >> -if (-position[2] + position[3] < 0) mask |= (1<<5); >> +if (!( position[2] >= 0)) mask |= (1<<4); >> +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5); >>} >> >>if (flags & DO_CLIP_USER) { >> @@ -146,7 +147,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs >> *pvs, >> if (clipdist < 0 || util_is_inf_or_nan(clipdist)) >>mask |= 1 << plane_idx; >> } else { >> - if (dot4(clipvertex, plane[plane_idx]) < 0) >> + if (!(dot4(clipvertex, plane[plane_idx]) >= 0)) >>mask |= 1 << plane_idx; >> } >> } >> @@ -192,7 +193,6 @@ static boolean TAG(do_cliptest)( struct pt_post_vs >> *pvs, >> >> out = (struct vertex_header *)( (char *)out + info->stride ); >> } >> - >> return need_pipeline != 0; >> } >> >> diff --git a/src/gallium/auxiliary/draw/draw_llvm.c >> b/src/gallium/auxiliary/draw/draw_llvm.c >> index 8435991..dad523a 100644 >> --- a/src/gallium/auxiliary/draw/draw_llvm.c >> +++ b/src/gallium/auxiliary/draw/draw_llvm.c >> @@ -1200,6 +1200,10 @@ generate_clipmask(struct draw_llvm *llvm, >> cv_w = pos_w; >> } >> >> + /* >> +* Be careful with the comparisons and NaNs (using llvm's unordered >> +* comparisons here). >> +*/ >> /* Cliptest, for hardwired planes */ >> if (clip_xy) { >> /* plane 1 */ >> > > Looks OK to me, but
Re: [Mesa-dev] [PATCH] util/macros: Simplify DIV_ROUND_UP() definition
On Thu, Dec 17, 2015 at 11:04 AM, Nanley Cherywrote: > On Thu, Dec 17, 2015 at 12:05:46PM +0100, Glenn Kennard wrote: >> On Wed, 16 Dec 2015 20:57:51 +0100, Nanley Chery >> wrote: >> >> >From: Nanley Chery >> > >> >Commit 64880d073ab21ae1abad0c049ea2d6a1169a3cfa consolidated two >> >DIV_ROUND_UP() definitions to one, but chose the more >> >compute-intensive version in the process. Use the simpler version >> >instead. Reduces .text size by 1360 bytes. >> > >> >Output of `size lib/i965_dri.so`: >> > textdata bss dec hex filename >> > 7850440 219264 27240 8096944 7b8cb0 lib/i965_dri.so (before) >> > 7849080 219264 27240 8095584 7b8760 lib/i965_dri.so (after) >> > >> >Cc: Axel Davy >> >Signed-off-by: Nanley Chery >> >--- >> > src/util/macros.h | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> >diff --git a/src/util/macros.h b/src/util/macros.h >> >index 0c8958f..53a98a0 100644 >> >--- a/src/util/macros.h >> >+++ b/src/util/macros.h >> >@@ -211,6 +211,6 @@ do { \ >> > #endif >> >/** Compute ceiling of integer quotient of A divided by B. */ >> >-#define DIV_ROUND_UP( A, B ) ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 ) >> >+#define DIV_ROUND_UP(A, B) (((A) + (B) - 1) / (B)) >> >#endif /* UTIL_MACROS_H */ >> >> I'll point out that these are not equivalent, one can overflow and the other >> doesn't. You >> probably want to check if the call sites have sufficient checks for that >> before >> substituting one for the other. >> > > Good point. As I mentioned in another email, I'll leave the current > macro untouched. I think the chances of us relying on overflow behavior are exceedingly small, and nearly all uses of DIV_ROUND_UP are in the i965 driver. I think it's sufficiently safe to go ahead with the patch (but I am still interested to know about your compiler and CFLAGS). ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: fix clip test with NaNs
On Thu, Dec 17, 2015 at 11:19 AM, Roland Scheideggerwrote: > Am 17.12.2015 um 19:13 schrieb Brian Paul: >> On 12/17/2015 09:58 AM, srol...@vmware.com wrote: >>> From: Roland Scheidegger >>> >>> NaNs mean it should be clipped, otherwise the NaNs might get passed to >>> the >>> next stages (if clipping didn't happen for another reason already), which >>> might cause all kind of problems. >>> The llvm path got this right already (possibly by luck), but this >>> isn't used >>> when there's a gs active. >>> Found by code inspection, verified with some hacked piglit test and >>> some more >>> hacked debug output. >>> (Note the clipper can still itself incorrectly generate NaN and INF >>> position >>> values in its output prims (at least after w divide / viewport >>> transform) even >>> if the inputs weren't NaNs, if the position data of the vertices is >>> "sufficiently bad".) >>> --- >>> src/gallium/auxiliary/draw/draw_cliptest_tmp.h | 28 >>> +- >>> src/gallium/auxiliary/draw/draw_llvm.c | 4 >>> 2 files changed, 18 insertions(+), 14 deletions(-) >>> >>> diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h >>> b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h >>> index 779b237..d8866cd 100644 >>> --- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h >>> +++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h >>> @@ -95,30 +95,31 @@ static boolean TAG(do_cliptest)( struct pt_post_vs >>> *pvs, >>> out->pre_clip_pos[i] = position[i]; >>>} >>> >>> + /* Be careful with NaNs. Comparisons must be true for them. */ >>>/* Do the hardwired planes first: >>> */ >>>if (flags & DO_CLIP_XY_GUARD_BAND) { >>> -if (-0.50 * position[0] + position[3] < 0) mask |= (1<<0); >>> -if ( 0.50 * position[0] + position[3] < 0) mask |= (1<<1); >>> -if (-0.50 * position[1] + position[3] < 0) mask |= (1<<2); >>> -if ( 0.50 * position[1] + position[3] < 0) mask |= (1<<3); >>> +if (!(-0.50 * position[0] + position[3] >= 0)) mask |= >>> (1<<0); >>> +if (!( 0.50 * position[0] + position[3] >= 0)) mask |= >>> (1<<1); >>> +if (!(-0.50 * position[1] + position[3] >= 0)) mask |= >>> (1<<2); >>> +if (!( 0.50 * position[1] + position[3] >= 0)) mask |= >>> (1<<3); >>>} >>>else if (flags & DO_CLIP_XY) { >>> -if (-position[0] + position[3] < 0) mask |= (1<<0); >>> -if ( position[0] + position[3] < 0) mask |= (1<<1); >>> -if (-position[1] + position[3] < 0) mask |= (1<<2); >>> -if ( position[1] + position[3] < 0) mask |= (1<<3); >>> +if (!(-position[0] + position[3] >= 0)) mask |= (1<<0); >>> +if (!( position[0] + position[3] >= 0)) mask |= (1<<1); >>> +if (!(-position[1] + position[3] >= 0)) mask |= (1<<2); >>> +if (!( position[1] + position[3] >= 0)) mask |= (1<<3); >>>} >>> >>>/* Clip Z planes according to full cube, half cube or none. >>> */ >>>if (flags & DO_CLIP_FULL_Z) { >>> -if ( position[2] + position[3] < 0) mask |= (1<<4); >>> -if (-position[2] + position[3] < 0) mask |= (1<<5); >>> +if (!( position[2] + position[3] >= 0)) mask |= (1<<4); >>> +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5); >>>} >>>else if (flags & DO_CLIP_HALF_Z) { >>> -if ( position[2] < 0) mask |= (1<<4); >>> -if (-position[2] + position[3] < 0) mask |= (1<<5); >>> +if (!( position[2] >= 0)) mask |= (1<<4); >>> +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5); >>>} >>> >>>if (flags & DO_CLIP_USER) { >>> @@ -146,7 +147,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs >>> *pvs, >>> if (clipdist < 0 || util_is_inf_or_nan(clipdist)) >>>mask |= 1 << plane_idx; >>> } else { >>> - if (dot4(clipvertex, plane[plane_idx]) < 0) >>> + if (!(dot4(clipvertex, plane[plane_idx]) >= 0)) >>>mask |= 1 << plane_idx; >>> } >>> } >>> @@ -192,7 +193,6 @@ static boolean TAG(do_cliptest)( struct pt_post_vs >>> *pvs, >>> >>> out = (struct vertex_header *)( (char *)out + info->stride ); >>> } >>> - >>> return need_pipeline != 0; >>> } >>> >>> diff --git a/src/gallium/auxiliary/draw/draw_llvm.c >>> b/src/gallium/auxiliary/draw/draw_llvm.c >>> index 8435991..dad523a 100644 >>> --- a/src/gallium/auxiliary/draw/draw_llvm.c >>> +++ b/src/gallium/auxiliary/draw/draw_llvm.c >>> @@ -1200,6 +1200,10 @@ generate_clipmask(struct draw_llvm *llvm, >>> cv_w = pos_w; >>> } >>> >>> + /* >>> +* Be careful with the comparisons
Re: [Mesa-dev] [PATCH] draw: fix clip test with NaNs
Am 17.12.2015 um 20:22 schrieb Matt Turner: > On Thu, Dec 17, 2015 at 11:19 AM, Roland Scheidegger> wrote: >> Am 17.12.2015 um 19:13 schrieb Brian Paul: >>> On 12/17/2015 09:58 AM, srol...@vmware.com wrote: From: Roland Scheidegger NaNs mean it should be clipped, otherwise the NaNs might get passed to the next stages (if clipping didn't happen for another reason already), which might cause all kind of problems. The llvm path got this right already (possibly by luck), but this isn't used when there's a gs active. Found by code inspection, verified with some hacked piglit test and some more hacked debug output. (Note the clipper can still itself incorrectly generate NaN and INF position values in its output prims (at least after w divide / viewport transform) even if the inputs weren't NaNs, if the position data of the vertices is "sufficiently bad".) --- src/gallium/auxiliary/draw/draw_cliptest_tmp.h | 28 +- src/gallium/auxiliary/draw/draw_llvm.c | 4 2 files changed, 18 insertions(+), 14 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h index 779b237..d8866cd 100644 --- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h +++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h @@ -95,30 +95,31 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs, out->pre_clip_pos[i] = position[i]; } + /* Be careful with NaNs. Comparisons must be true for them. */ /* Do the hardwired planes first: */ if (flags & DO_CLIP_XY_GUARD_BAND) { -if (-0.50 * position[0] + position[3] < 0) mask |= (1<<0); -if ( 0.50 * position[0] + position[3] < 0) mask |= (1<<1); -if (-0.50 * position[1] + position[3] < 0) mask |= (1<<2); -if ( 0.50 * position[1] + position[3] < 0) mask |= (1<<3); +if (!(-0.50 * position[0] + position[3] >= 0)) mask |= (1<<0); +if (!( 0.50 * position[0] + position[3] >= 0)) mask |= (1<<1); +if (!(-0.50 * position[1] + position[3] >= 0)) mask |= (1<<2); +if (!( 0.50 * position[1] + position[3] >= 0)) mask |= (1<<3); } else if (flags & DO_CLIP_XY) { -if (-position[0] + position[3] < 0) mask |= (1<<0); -if ( position[0] + position[3] < 0) mask |= (1<<1); -if (-position[1] + position[3] < 0) mask |= (1<<2); -if ( position[1] + position[3] < 0) mask |= (1<<3); +if (!(-position[0] + position[3] >= 0)) mask |= (1<<0); +if (!( position[0] + position[3] >= 0)) mask |= (1<<1); +if (!(-position[1] + position[3] >= 0)) mask |= (1<<2); +if (!( position[1] + position[3] >= 0)) mask |= (1<<3); } /* Clip Z planes according to full cube, half cube or none. */ if (flags & DO_CLIP_FULL_Z) { -if ( position[2] + position[3] < 0) mask |= (1<<4); -if (-position[2] + position[3] < 0) mask |= (1<<5); +if (!( position[2] + position[3] >= 0)) mask |= (1<<4); +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5); } else if (flags & DO_CLIP_HALF_Z) { -if ( position[2] < 0) mask |= (1<<4); -if (-position[2] + position[3] < 0) mask |= (1<<5); +if (!( position[2] >= 0)) mask |= (1<<4); +if (!(-position[2] + position[3] >= 0)) mask |= (1<<5); } if (flags & DO_CLIP_USER) { @@ -146,7 +147,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs, if (clipdist < 0 || util_is_inf_or_nan(clipdist)) mask |= 1 << plane_idx; } else { - if (dot4(clipvertex, plane[plane_idx]) < 0) + if (!(dot4(clipvertex, plane[plane_idx]) >= 0)) mask |= 1 << plane_idx; } } @@ -192,7 +193,6 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs, out = (struct vertex_header *)( (char *)out + info->stride ); } - return need_pipeline != 0; } diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 8435991..dad523a 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@
Re: [Mesa-dev] [PATCH] util/macros: Simplify DIV_ROUND_UP() definition
On Thu, Dec 17, 2015 at 12:05:46PM +0100, Glenn Kennard wrote: > On Wed, 16 Dec 2015 20:57:51 +0100, Nanley Chery> wrote: > > >From: Nanley Chery > > > >Commit 64880d073ab21ae1abad0c049ea2d6a1169a3cfa consolidated two > >DIV_ROUND_UP() definitions to one, but chose the more > >compute-intensive version in the process. Use the simpler version > >instead. Reduces .text size by 1360 bytes. > > > >Output of `size lib/i965_dri.so`: > > textdata bss dec hex filename > > 7850440 219264 27240 8096944 7b8cb0 lib/i965_dri.so (before) > > 7849080 219264 27240 8095584 7b8760 lib/i965_dri.so (after) > > > >Cc: Axel Davy > >Signed-off-by: Nanley Chery > >--- > > src/util/macros.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > >diff --git a/src/util/macros.h b/src/util/macros.h > >index 0c8958f..53a98a0 100644 > >--- a/src/util/macros.h > >+++ b/src/util/macros.h > >@@ -211,6 +211,6 @@ do { \ > > #endif > >/** Compute ceiling of integer quotient of A divided by B. */ > >-#define DIV_ROUND_UP( A, B ) ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 ) > >+#define DIV_ROUND_UP(A, B) (((A) + (B) - 1) / (B)) > >#endif /* UTIL_MACROS_H */ > > I'll point out that these are not equivalent, one can overflow and the other > doesn't. You > probably want to check if the call sites have sufficient checks for that > before > substituting one for the other. > Good point. As I mentioned in another email, I'll leave the current macro untouched. Thanks, Nanley > > /Glenn ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] virgl: enable building on Android
This is just a copy-n-paste and rename of vc4 Android makefiles. Signed-off-by: Rob Herring--- Android.mk | 4 ++-- src/gallium/Android.mk | 5 + src/gallium/drivers/virgl/Android.mk| 35 + src/gallium/targets/dri/Android.mk | 4 src/gallium/winsys/virgl/drm/Android.mk | 34 5 files changed, 80 insertions(+), 2 deletions(-) create mode 100644 src/gallium/drivers/virgl/Android.mk create mode 100644 src/gallium/winsys/virgl/drm/Android.mk diff --git a/Android.mk b/Android.mk index ed160fb..1d76559 100644 --- a/Android.mk +++ b/Android.mk @@ -24,7 +24,7 @@ # BOARD_GPU_DRIVERS should be defined. The valid values are # # classic drivers: i915 i965 -# gallium drivers: swrast freedreno i915g ilo nouveau r300g r600g radeonsi vc4 vmwgfx +# gallium drivers: swrast freedreno i915g ilo nouveau r300g r600g radeonsi vc4 virgl vmwgfx # # The main target is libGLES_mesa. For each classic driver enabled, a DRI # module will also be built. DRI modules will be loaded by libGLES_mesa. @@ -46,7 +46,7 @@ MESA_COMMON_MK := $(MESA_TOP)/Android.common.mk MESA_PYTHON2 := python classic_drivers := i915 i965 -gallium_drivers := swrast freedreno i915g ilo nouveau r300g r600g radeonsi vmwgfx vc4 +gallium_drivers := swrast freedreno i915g ilo nouveau r300g r600g radeonsi vmwgfx vc4 virgl MESA_GPU_DRIVERS := $(strip $(BOARD_GPU_DRIVERS)) diff --git a/src/gallium/Android.mk b/src/gallium/Android.mk index b406d4a..749be7d 100644 --- a/src/gallium/Android.mk +++ b/src/gallium/Android.mk @@ -83,6 +83,11 @@ ifneq ($(filter vc4, $(MESA_GPU_DRIVERS)),) SUBDIRS += winsys/vc4/drm drivers/vc4 endif +# virgl +ifneq ($(filter virgl, $(MESA_GPU_DRIVERS)),) +SUBDIRS += winsys/virgl/drm drivers/virgl +endif + # vmwgfx ifneq ($(filter vmwgfx, $(MESA_GPU_DRIVERS)),) SUBDIRS += winsys/svga/drm drivers/svga diff --git a/src/gallium/drivers/virgl/Android.mk b/src/gallium/drivers/virgl/Android.mk new file mode 100644 index 000..b8309e4 --- /dev/null +++ b/src/gallium/drivers/virgl/Android.mk @@ -0,0 +1,35 @@ +# Copyright (C) 2014 Emil Velikov +# +# Permission is hereby granted, free of charge, to any person obtaining a +# copy of this software and associated documentation files (the "Software"), +# to deal in the Software without restriction, including without limitation +# the rights to use, copy, modify, merge, publish, distribute, sublicense, +# and/or sell copies of the Software, and to permit persons to whom the +# Software is furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included +# in all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +# DEALINGS IN THE SOFTWARE. + +LOCAL_PATH := $(call my-dir) + +# get C_SOURCES +include $(LOCAL_PATH)/Makefile.sources + +include $(CLEAR_VARS) + +LOCAL_SRC_FILES := \ + $(C_SOURCES) + +LOCAL_SHARED_LIBRARIES := libdrm +LOCAL_MODULE := libmesa_pipe_virgl + +include $(GALLIUM_COMMON_MK) +include $(BUILD_STATIC_LIBRARY) diff --git a/src/gallium/targets/dri/Android.mk b/src/gallium/targets/dri/Android.mk index 2d9610e..371e68d 100644 --- a/src/gallium/targets/dri/Android.mk +++ b/src/gallium/targets/dri/Android.mk @@ -92,6 +92,10 @@ ifneq ($(filter vc4,$(MESA_GPU_DRIVERS)),) LOCAL_CFLAGS += -DGALLIUM_VC4 gallium_DRIVERS += libmesa_winsys_vc4 libmesa_pipe_vc4 endif +ifneq ($(filter virgl,$(MESA_GPU_DRIVERS)),) +LOCAL_CFLAGS += -DGALLIUM_VIRGL +gallium_DRIVERS += libmesa_winsys_virgl libmesa_pipe_virgl +endif ifneq ($(filter vmwgfx,$(MESA_GPU_DRIVERS)),) gallium_DRIVERS += libmesa_winsys_svga libmesa_pipe_svga LOCAL_CFLAGS += -DGALLIUM_VMWGFX diff --git a/src/gallium/winsys/virgl/drm/Android.mk b/src/gallium/winsys/virgl/drm/Android.mk new file mode 100644 index 000..8493503 --- /dev/null +++ b/src/gallium/winsys/virgl/drm/Android.mk @@ -0,0 +1,34 @@ +# Copyright (C) 2014 Emil Velikov +# +# Permission is hereby granted, free of charge, to any person obtaining a +# copy of this software and associated documentation files (the "Software"), +# to deal in the Software without restriction, including without limitation +# the rights to use, copy, modify, merge, publish, distribute, sublicense, +# and/or sell copies of the Software, and to permit persons to whom the +# Software is furnished to do so, subject to the following conditions: +#
Re: [Mesa-dev] [PATCH v2] Add .mailmap
Woah, this brought back memories of fighting with X and mesa! If you want to list me, I'd prefer you use re...@users.sourceforge.net On Thu, Dec 17, 2015 at 12:27 PM, Erik Faye-Lundwrote: > On Thu, Dec 17, 2015 at 10:39 AM, Erik Faye-Lund > wrote: > > On Thu, Dec 17, 2015 at 10:09 AM, Giuseppe Bilotta > > wrote: > >> +# missing svn authors: > > > > I'm not sure this is useful, but just for kicks, I decided to try to > > track down these contributors, and this is what I came up with > > (suspected contributors CC'ed, so they can confirm or deny if they > > want): > > > >> +pesco > > > > Probably "Sven M. Hallberg " > > (Sources: http://sourceforge.net/p/mesa3d/mailman/message/5416179/ and > > https://github.com/jwiegley/lambdabot-1/blob/master/AUTHORS#L17) > > OK, this address bounced (or rather, mailer-dae...@kundenserver.de > buonced pe...@gmx.de, so I guess it's forwarded to a dead address) > > However, his homepage (http://khjk.org/~pesco/) suggests that "Sven M. > Hallberg " might be an up-to-date e-mail address. > CC'ed. > > >> +reist > > > > Probably "Boris Peterbarg " > > (Sources: > http://sourceforge.net/p/r300/mailman/r300-commit/?style=threaded=200502=16 > > and > http://www.bugzilla.icculus.org/projects/beagle-avahi/Filters/entagged-sharp/Tracker/Util/TrackerTagReader.cs > ) > > This one also bounced, but it seems he committed to Rails > (https://github.com/rails/rails/pull/19351) as "Boris Peterbarg > ". Again, CC'ed. > -- Boris Peterbarg Seeking Alpha ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] virtio_gpu: Add PCI ID to driver map
Add the virtio-gpu PCI ID so the driver probing works. Signed-off-by: Rob Herring--- include/pci_ids/virtio_gpu_pci_ids.h | 1 + src/loader/pci_id_driver_map.h | 7 +++ 2 files changed, 8 insertions(+) create mode 100644 include/pci_ids/virtio_gpu_pci_ids.h diff --git a/include/pci_ids/virtio_gpu_pci_ids.h b/include/pci_ids/virtio_gpu_pci_ids.h new file mode 100644 index 000..2e6ecaf --- /dev/null +++ b/include/pci_ids/virtio_gpu_pci_ids.h @@ -0,0 +1 @@ +CHIPSET(0x0010, VIRTGL, VIRTGL) diff --git a/src/loader/pci_id_driver_map.h b/src/loader/pci_id_driver_map.h index 11e39d3..cab69fb 100644 --- a/src/loader/pci_id_driver_map.h +++ b/src/loader/pci_id_driver_map.h @@ -53,6 +53,12 @@ static const int radeonsi_chip_ids[] = { #undef CHIPSET }; +static const int virtio_gpu_chip_ids[] = { +#define CHIPSET(chip, name, family) chip, +#include "pci_ids/virtio_gpu_pci_ids.h" +#undef CHIPSET +}; + static const int vmwgfx_chip_ids[] = { #define CHIPSET(chip, name, family) chip, #include "pci_ids/vmwgfx_pci_ids.h" @@ -78,6 +84,7 @@ static const struct { { 0x1002, "radeonsi", radeonsi_chip_ids, ARRAY_SIZE(radeonsi_chip_ids), _LOADER_GALLIUM}, { 0x10de, "nouveau_vieux", NULL, -1, _LOADER_DRI, is_nouveau_vieux }, { 0x10de, "nouveau", NULL, -1, _LOADER_GALLIUM }, + { 0x1af4, "virtio_gpu", virtio_gpu_chip_ids, ARRAY_SIZE(virtio_gpu_chip_ids), _LOADER_GALLIUM }, { 0x15ad, "vmwgfx", vmwgfx_chip_ids, ARRAY_SIZE(vmwgfx_chip_ids), _LOADER_GALLIUM }, { 0x, NULL, NULL, 0 }, }; -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] freedreno/ir3: fix 32-bit builds with pointer-to-int-cast error enabled
Android builds with -Werror=pointer-to-int-cast causing an error on 32-bit builds. Signed-off-by: Rob Herring--- src/gallium/drivers/freedreno/ir3/ir3_print.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/ir3/ir3_print.c b/src/gallium/drivers/freedreno/ir3/ir3_print.c index 07e03d2..a84e798 100644 --- a/src/gallium/drivers/freedreno/ir3/ir3_print.c +++ b/src/gallium/drivers/freedreno/ir3/ir3_print.c @@ -143,7 +143,7 @@ block_id(struct ir3_block *block) #ifdef DEBUG return block->serialno; #else - return (uint32_t)(uint64_t)block; + return (uint32_t)(unsigned long)block; #endif } -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] Add .mailmap
On Thu, Dec 17, 2015 at 12:30 PM, Sven M. Hallbergwrote: > Erik Faye-Lund writes: >>> Probably "Sven M. Hallberg " >>> (Sources: http://sourceforge.net/p/mesa3d/mailman/message/5416179/ and >>> https://github.com/jwiegley/lambdabot-1/blob/master/AUTHORS#L17) >> >> OK, this address bounced (or rather, mailer-dae...@kundenserver.de >> buonced pe...@gmx.de, so I guess it's forwarded to a dead address) >> >> However, his homepage (http://khjk.org/~pesco/) suggests that "Sven M. >> Hallberg " might be an up-to-date e-mail address. >> CC'ed. > > You are in fact correct. :) Oh, nice, would it be OK for you if I add this email address to the mesa .mailmap? -- Giuseppe "Oblomov" Bilotta ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Only apply CS stall workaround pre-SKL
As per the docs. Cc: Kenneth GraunkeSigned-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c index ae3d818..6c636d2 100644 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c @@ -97,7 +97,8 @@ void brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags) { if (brw->gen >= 8) { - gen8_add_cs_stall_workaround_bits(); + if (brw->gen == 8) + gen8_add_cs_stall_workaround_bits(); BEGIN_BATCH(6); OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); @@ -141,7 +142,8 @@ brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags, uint32_t imm_lower, uint32_t imm_upper) { if (brw->gen >= 8) { - gen8_add_cs_stall_workaround_bits(); + if (brw->gen == 8) + gen8_add_cs_stall_workaround_bits(); BEGIN_BATCH(6); OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); -- 2.6.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radeonsi: add RADEON_REPLACE_SHADERS debug option
From: Nicolai HähnleThis option allows replacing a single shader by a pre-compiled ELF object as generated by LLVM's llc, for example. This can be useful for debugging a deterministically occuring error in shaders (and has in fact helped find the causes of https://bugs.freedesktop.org/show_bug.cgi?id=93264). --- src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/gallium/drivers/radeonsi/si_debug.c | 94 +++ src/gallium/drivers/radeonsi/si_pipe.c| 3 + src/gallium/drivers/radeonsi/si_pipe.h| 1 + src/gallium/drivers/radeonsi/si_shader.c | 18 +++-- 5 files changed, 112 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index c3933b1d..556c7cc 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -87,6 +87,7 @@ #define DBG_NO_DCC (1llu << 43) #define DBG_NO_DCC_CLEAR (1llu << 44) #define DBG_NO_RB_PLUS (1llu << 45) +#define DBG_REPLACE_SHADERS(1llu << 46) #define R600_MAP_BUFFER_ALIGNMENT 64 diff --git a/src/gallium/drivers/radeonsi/si_debug.c b/src/gallium/drivers/radeonsi/si_debug.c index c45f8c0..f50d98c 100644 --- a/src/gallium/drivers/radeonsi/si_debug.c +++ b/src/gallium/drivers/radeonsi/si_debug.c @@ -28,7 +28,9 @@ #include "si_shader.h" #include "sid.h" #include "sid_tables.h" +#include "radeon/radeon_elf_util.h" #include "ddebug/dd_util.h" +#include "util/u_memory.h" static void si_dump_shader(struct si_shader_ctx_state *state, const char *name, @@ -42,6 +44,98 @@ static void si_dump_shader(struct si_shader_ctx_state *state, const char *name, fprintf(f, "%s\n\n", state->current->binary.disasm_string); } +/** + * Shader compiles can be overridden with arbitrary ELF objects by setting + * the environment variable RADEON_REPLACE_SHADERS=num1:filename1[;num2:filename2] + */ +bool si_replace_shader(unsigned num, struct radeon_shader_binary *binary) +{ + const char *p = debug_get_option("RADEON_REPLACE_SHADERS", NULL); + const char *semicolon; + char *copy = NULL; + FILE *f; + long filesize, nread; + char *buf = NULL; + bool replaced = false; + + if (!p) + return false; + + while (*p) { + unsigned long i; + char *endp; + i = strtoul(p, , 0); + + p = endp; + if (*p != ':') { + fprintf(stderr, "RADEON_REPLACE_SHADERS formatted badly.\n"); + exit(1); + } + ++p; + + if (i == num) + break; + + p = strchr(p, ';'); + if (!p) + return false; + ++p; + } + if (!*p) + return false; + + semicolon = strchr(p, ';'); + if (semicolon) { + p = copy = strndup(p, semicolon - p); + if (!copy) { + fprintf(stderr, "out of memory\n"); + return false; + } + } + + fprintf(stderr, "radeonsi: replace shader %u by %s\n", num, p); + + f = fopen(p, "r"); + if (!f) { + perror("radeonsi: failed to open file"); + goto out_free; + } + + if (fseek(f, 0, SEEK_END) != 0) + goto file_error; + + filesize = ftell(f); + if (filesize < 0) + goto file_error; + + if (fseek(f, 0, SEEK_SET) != 0) + goto file_error; + + buf = MALLOC(filesize); + if (!buf) { + fprintf(stderr, "out of memory\n"); + goto out_close; + } + + nread = fread(buf, 1, filesize, f); + if (nread != filesize) + goto file_error; + + radeon_elf_read(buf, filesize, binary); + replaced = true; + +out_close: + fclose(f); +out_free: + FREE(buf); + free(copy); + return replaced; + +file_error: + perror("radeonsi: reading shader"); + goto out_close; +} + /* Parsed IBs are difficult to read without colors. Use "less -R file" to * read them, or use "aha -b -f file" to convert them to html. */ diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index ac13407..6a1911f 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++ b/src/gallium/drivers/radeonsi/si_pipe.c @@ -639,6 +639,9 @@ struct pipe_screen *radeonsi_screen_create(struct radeon_winsys *ws) if (debug_get_bool_option("RADEON_DUMP_SHADERS", FALSE)) sscreen->b.debug_flags |= DBG_FS | DBG_VS | DBG_GS | DBG_PS | DBG_CS; + if (debug_get_option("RADEON_REPLACE_SHADERS", NULL)) + sscreen->b.debug_flags |= DBG_REPLACE_SHADERS; + /* Create the auxiliary context. This must be done
[Mesa-dev] [PATCH 1/2] radeonsi: count compilations in si_compile_llvm
From: Nicolai HähnleThis changes the count slightly (because of si_generate_gs_copy_shader), but this is only relevant for the driver-specific num-compilations query. It sets the stage for the next commit. --- src/gallium/drivers/radeonsi/si_shader.c| 2 ++ src/gallium/drivers/radeonsi/si_state_shaders.c | 1 - 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 4a67276..511ed88 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -3885,6 +3885,8 @@ int si_compile_llvm(struct si_screen *sscreen, struct si_shader *shader, shader->selector ? shader->selector->tokens : NULL); bool dump_ir = dump_asm && !(sscreen->b.debug_flags & DBG_NO_IR); + p_atomic_inc(>b.num_compilations); + r = radeon_llvm_compile(mod, >binary, r600_get_llvm_processor_name(sscreen->b.family), dump_ir, dump_asm, tm); if (r) diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index f0147ce..8700590 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -634,7 +634,6 @@ static int si_shader_select(struct pipe_context *ctx, sel->last_variant = shader; } state->current = shader; - p_atomic_inc(>screen->b.num_compilations); pipe_mutex_unlock(sel->mutex); return 0; } -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium/radeon: only dispose locally created target machine in radeon_llvm_compile
From: Nicolai HähnleUnify the cleanup paths of the function rather than duplicating code. --- src/gallium/drivers/radeon/radeon_llvm_emit.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c b/src/gallium/drivers/radeon/radeon_llvm_emit.c index 6b2ebde..61ed940 100644 --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c @@ -188,8 +188,8 @@ unsigned radeon_llvm_compile(LLVMModuleRef M, struct radeon_shader_binary *binar if (mem_err) { fprintf(stderr, "%s: %s", __FUNCTION__, err); FREE(err); - LLVMDisposeTargetMachine(tm); - return 1; + rval = 1; + goto out; } if (0 != rval) { @@ -205,6 +205,7 @@ unsigned radeon_llvm_compile(LLVMModuleRef M, struct radeon_shader_binary *binar /* Clean up */ LLVMDisposeMemoryBuffer(out_buffer); +out: if (dispose_tm) { LLVMDisposeTargetMachine(tm); } -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] Add .mailmap
Hello Boris, On Thu, Dec 17, 2015 at 10:09 PM, Boris Peterbargwrote: > Woah, this brought back memories of fighting with X and mesa! > > If you want to list me, I'd prefer you use re...@users.sourceforge.net Thanks, I've added the mapping. Unless of course you'd rather want to remain anonymous 8-) -- Giuseppe "Oblomov" Bilotta ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] Add .mailmap
On Thu, Dec 17, 2015 at 10:09 PM, Boris Peterbargwrote: > Woah, this brought back memories of fighting with X and mesa! > > If you want to list me, I'd prefer you use re...@users.sourceforge.net Actually, your comment just gave me a _brilliant_ idea. If mesa was hosted on SF's svn, I'm ready to bet that all the missing ones can be recovered from the SF user list, so for example the still unidentified sio and sio2 are probably http://sourceforge.net/u/sio/profile/ http://sourceforge.net/u/sio2/profile/ Was Mesa ever hosted on a different svn repo? -- Giuseppe "Oblomov" Bilotta ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 1/2] mesa: Add a _mesa_active_fragment_shader_has_side_effects helper
On Thu, 2015-12-17 at 16:29 +0200, Francisco Jerez wrote: > Iago Toral Quirogawrites: > > > Some drivers can disable the FS unit if there is nothing in the shader code > > that writes to an output (i.e. color, depth, etc). Right now, mesa has > > a function to check for atomic buffers and the i965 driver also checks for > > images. Refactor this logic into a generic function that we can use for > > any source of side effects in a fragment shader. Sugested by Jason. > > --- > > src/mesa/drivers/dri/i965/gen7_wm_state.c | 6 +- > > src/mesa/drivers/dri/i965/gen8_ps_state.c | 3 +-- > > src/mesa/main/mtypes.h| 15 --- > > 3 files changed, 14 insertions(+), 10 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c > > b/src/mesa/drivers/dri/i965/gen7_wm_state.c > > index 06d5e65..a6d1028 100644 > > --- a/src/mesa/drivers/dri/i965/gen7_wm_state.c > > +++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c > > @@ -77,13 +77,9 @@ upload_wm_state(struct brw_context *brw) > >dw1 |= GEN7_WM_KILL_ENABLE; > > } > > > > - if (_mesa_active_fragment_shader_has_atomic_ops(>ctx)) { > > - dw1 |= GEN7_WM_DISPATCH_ENABLE; > > - } > > - > > /* _NEW_BUFFERS | _NEW_COLOR */ > > if (brw_color_buffer_write_enabled(brw) || writes_depth || > > - prog_data->base.nr_image_params || > > + _mesa_active_fragment_shader_has_side_effects(>ctx) || > > dw1 & GEN7_WM_KILL_ENABLE) { > >dw1 |= GEN7_WM_DISPATCH_ENABLE; > > } > > Hey, it looks like SSBOs are still missing a couple of things that could > make their side effects rather non-deterministic on i965 hardware: On > HSW you should probably set the UAV_ONLY WM state bit when there are no > colour or depth buffer writes as is done for images below in this same > function, and on all hardware you should set the early depth/stencil > control field to PSEXEC unless early fragment tests are enabled to make > sure that the fragment shader is executed regardless of whether > per-fragment tests pass or not as the spec requires. Sure, I'll add this and send a new version. Thanks Curro! BTW, I see that we are doing these two things only for images at the moment, I guess we should we do it for atomic buffers as well, right? > > diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c > > b/src/mesa/drivers/dri/i965/gen8_ps_state.c > > index 945f710..3cc8c68 100644 > > --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c > > +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c > > @@ -90,8 +90,7 @@ gen8_upload_ps_extra(struct brw_context *brw, > > * > > * BRW_NEW_FS_PROG_DATA | BRW_NEW_FRAGMENT_PROGRAM | _NEW_BUFFERS | > > _NEW_COLOR > > */ > > - if ((_mesa_active_fragment_shader_has_atomic_ops(>ctx) || > > -prog_data->base.nr_image_params) && > > + if (_mesa_active_fragment_shader_has_side_effects(>ctx) && > > !brw_color_buffer_write_enabled(brw)) > >dw1 |= GEN8_PSX_SHADER_HAS_UAV; > > > > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h > > index 191a9ea..834ba59 100644 > > --- a/src/mesa/main/mtypes.h > > +++ b/src/mesa/main/mtypes.h > > @@ -4538,11 +4538,20 @@ enum _debug > > DEBUG_INCOMPLETE_FBO = (1 << 3) > > }; > > > > +/** > > + * Checks if the active fragment shader program can have side effects due > > + * to use of things like atomic buffers or images > > + */ > > static inline bool > > -_mesa_active_fragment_shader_has_atomic_ops(const struct gl_context *ctx) > > +_mesa_active_fragment_shader_has_side_effects(const struct gl_context *ctx) > > { > > - return ctx->Shader._CurrentFragmentProgram != NULL && > > - > > ctx->Shader._CurrentFragmentProgram->_LinkedShaders[MESA_SHADER_FRAGMENT]->NumAtomicBuffers > > > 0; > > + const struct gl_shader *sh; > > + > > + if (!ctx->Shader._CurrentFragmentProgram) > > + return false; > > + > > + sh = > > ctx->Shader._CurrentFragmentProgram->_LinkedShaders[MESA_SHADER_FRAGMENT]; > > + return sh->NumAtomicBuffers > 0 || sh->NumImages > 0; > > } > > > > #ifdef __cplusplus > > -- > > 1.9.1 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] Add .mailmap
Erik Faye-Lundwrites: >> Probably "Sven M. Hallberg " >> (Sources: http://sourceforge.net/p/mesa3d/mailman/message/5416179/ and >> https://github.com/jwiegley/lambdabot-1/blob/master/AUTHORS#L17) > > OK, this address bounced (or rather, mailer-dae...@kundenserver.de > buonced pe...@gmx.de, so I guess it's forwarded to a dead address) > > However, his homepage (http://khjk.org/~pesco/) suggests that "Sven M. > Hallberg " might be an up-to-date e-mail address. > CC'ed. You are in fact correct. :) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/11] [RFC] mesa: allow binding framebuffer without depth
On Wed, Dec 16, 2015 at 11:30 PM, Miklós Mátéwrote: > On 12/16/2015 05:27 PM, Marek Olšák wrote: >> >> What is this good for? >> >> Marek > > KotOR uses a series of scratch framebuffers for drawing the framebuffer > effects. These have no depth and no stencil, so check_compatible() rejects > them, subsequent GL calls are no-op, and the screen becomes garbage. I also > experimented successfully with disabling the visuals that have no depth or > no stencil in gallium/state_trackers/dri/dri_screen.c, but I concluded that > this one was a smaller hack. What kind of scratch buffer are we talking about? How is it created? Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/radeon: only dispose locally created target machine in radeon_llvm_compile
On 18.12.2015 07:00, Nicolai Hähnle wrote: > From: Nicolai Hähnle> > Unify the cleanup paths of the function rather than duplicating code. This should probably be backported to the stable branches? If so, add Cc: "11.0 11.1" to the commit log, no need to actually send the patch to the mesa-stable mailing list. Reviewed-by: Michel Dänzer -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] gallium: Remove unnecessary semicolons
On Thu, Dec 17, 2015 at 5:13 AM, Matt Turnerwrote: > On Wed, Dec 16, 2015 at 7:41 PM, Edward O'Callaghan > wrote: >> Fix silly issue with MSVC case fall-though support to need >> a extra 'break;' >> >> Found-by: Coccinelle >> Signed-off-by: Edward O'Callaghan >> Reviewed-by: Brian Paul >> --- >> src/gallium/auxiliary/draw/draw_pipe_aaline.c | 2 +- >> src/gallium/auxiliary/gallivm/lp_bld_swizzle.c | 2 +- >> src/gallium/auxiliary/nir/tgsi_to_nir.c| 2 +- >> src/gallium/auxiliary/util/u_surface.c | 3 ++- >> src/gallium/auxiliary/vl/vl_mpeg12_decoder.c | 2 +- >> src/gallium/state_trackers/nine/swapchain9.c | 2 +- >> src/gallium/state_trackers/omx/entrypoint.c| 2 +- >> src/gallium/state_trackers/vdpau/mixer.c | 2 +- >> 8 files changed, 9 insertions(+), 8 deletions(-) >> >> diff --git a/src/gallium/auxiliary/draw/draw_pipe_aaline.c >> b/src/gallium/auxiliary/draw/draw_pipe_aaline.c >> index 877db59..4a676b7 100644 >> --- a/src/gallium/auxiliary/draw/draw_pipe_aaline.c >> +++ b/src/gallium/auxiliary/draw/draw_pipe_aaline.c >> @@ -937,7 +937,7 @@ draw_aaline_prepare_outputs(struct draw_context *draw, >> const struct pipe_rasterizer_state *rast = draw->rasterizer; >> >> /* update vertex attrib info */ >> - aaline->pos_slot = draw_current_shader_position_output(draw);; >> + aaline->pos_slot = draw_current_shader_position_output(draw); >> >> if (!rast->line_smooth) >>return; >> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_swizzle.c >> b/src/gallium/auxiliary/gallivm/lp_bld_swizzle.c >> index b1aef71..f571838 100644 >> --- a/src/gallium/auxiliary/gallivm/lp_bld_swizzle.c >> +++ b/src/gallium/auxiliary/gallivm/lp_bld_swizzle.c >> @@ -720,7 +720,7 @@ lp_build_transpose_aos_n(struct gallivm_state *gallivm, >> >>default: >> assert(0); >> - }; >> + } >> } >> >> >> diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c >> b/src/gallium/auxiliary/nir/tgsi_to_nir.c >> index 5def6d3..5cbe8e9 100644 >> --- a/src/gallium/auxiliary/nir/tgsi_to_nir.c >> +++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c >> @@ -1951,7 +1951,7 @@ tgsi_processor_to_shader_stage(unsigned processor) >> case TGSI_PROCESSOR_COMPUTE: return MESA_SHADER_COMPUTE; >> default: >>unreachable("invalid TGSI processor"); >> - }; >> + } >> } >> >> struct nir_shader * >> diff --git a/src/gallium/auxiliary/util/u_surface.c >> b/src/gallium/auxiliary/util/u_surface.c >> index 6aa44f9..c150d92 100644 >> --- a/src/gallium/auxiliary/util/u_surface.c >> +++ b/src/gallium/auxiliary/util/u_surface.c >> @@ -600,7 +600,8 @@ is_box_inside_resource(const struct pipe_resource *res, >>depth = res->array_size; >>assert(res->array_size % 6 == 0); >>break; >> - case PIPE_MAX_TEXTURE_TYPES:; > > Yuck! I have never seen this before. > > Grepping for ':;' turns up a bunch more of these. They all seem to > blame to Marek. Can we please not do this? AFAIK, default:; is the simplest way to add an empty default statement to hide warnings showing unhandled cases. break; isn't very useful if it's the last statement. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] util/macros: Simplify DIV_ROUND_UP() definition
On Wed, 16 Dec 2015 20:57:51 +0100, Nanley Cherywrote: From: Nanley Chery Commit 64880d073ab21ae1abad0c049ea2d6a1169a3cfa consolidated two DIV_ROUND_UP() definitions to one, but chose the more compute-intensive version in the process. Use the simpler version instead. Reduces .text size by 1360 bytes. Output of `size lib/i965_dri.so`: textdata bss dec hex filename 7850440 219264 27240 8096944 7b8cb0 lib/i965_dri.so (before) 7849080 219264 27240 8095584 7b8760 lib/i965_dri.so (after) Cc: Axel Davy Signed-off-by: Nanley Chery --- src/util/macros.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/util/macros.h b/src/util/macros.h index 0c8958f..53a98a0 100644 --- a/src/util/macros.h +++ b/src/util/macros.h @@ -211,6 +211,6 @@ do { \ #endif /** Compute ceiling of integer quotient of A divided by B. */ -#define DIV_ROUND_UP( A, B ) ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 ) +#define DIV_ROUND_UP(A, B) (((A) + (B) - 1) / (B)) #endif /* UTIL_MACROS_H */ I'll point out that these are not equivalent, one can overflow and the other doesn't. You probably want to check if the call sites have sufficient checks for that before substituting one for the other. /Glenn ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] Add .mailmap
On Thu, Dec 17, 2015 at 4:46 PM, Matt Turnerwrote: > On Thu, Dec 17, 2015 at 1:09 AM, Giuseppe Bilotta > wrote: >> This adds a first tentative .mailmap file, to canonicize contributor >> name/emails in shortlogs and other statistical endeavours. > > If we want this kind of information, should we just go with Ken's > gitdm branch? That'll provide information about per-company > contribution stats as well. git builtins can't use the gitdm stuff, AFAIK (so e.g. git shortlog wouldn't coalesce authors). Having duplicate information _is_ annoying, but if git dm can use the .mailmap (at least partially), then of course redundancy can be curbed. -- Giuseppe "Oblomov" Bilotta ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] util/macros: Simplify DIV_ROUND_UP() definition
On Thu, Dec 17, 2015 at 11:25 AM, Matt Turnerwrote: > On Thu, Dec 17, 2015 at 11:04 AM, Nanley Chery > wrote: > > On Thu, Dec 17, 2015 at 12:05:46PM +0100, Glenn Kennard wrote: > >> On Wed, 16 Dec 2015 20:57:51 +0100, Nanley Chery > wrote: > >> > >> >From: Nanley Chery > >> > > >> >Commit 64880d073ab21ae1abad0c049ea2d6a1169a3cfa consolidated two > >> >DIV_ROUND_UP() definitions to one, but chose the more > >> >compute-intensive version in the process. Use the simpler version > >> >instead. Reduces .text size by 1360 bytes. > >> > > >> >Output of `size lib/i965_dri.so`: > >> > textdata bss dec hex filename > >> > 7850440 219264 27240 8096944 7b8cb0 lib/i965_dri.so (before) > >> > 7849080 219264 27240 8095584 7b8760 lib/i965_dri.so (after) > >> > > >> >Cc: Axel Davy > >> >Signed-off-by: Nanley Chery > >> >--- > >> > src/util/macros.h | 2 +- > >> > 1 file changed, 1 insertion(+), 1 deletion(-) > >> > > >> >diff --git a/src/util/macros.h b/src/util/macros.h > >> >index 0c8958f..53a98a0 100644 > >> >--- a/src/util/macros.h > >> >+++ b/src/util/macros.h > >> >@@ -211,6 +211,6 @@ do { \ > >> > #endif > >> >/** Compute ceiling of integer quotient of A divided by B. */ > >> >-#define DIV_ROUND_UP( A, B ) ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 ) > >> >+#define DIV_ROUND_UP(A, B) (((A) + (B) - 1) / (B)) > >> >#endif /* UTIL_MACROS_H */ > >> > >> I'll point out that these are not equivalent, one can overflow and the > other doesn't. You > >> probably want to check if the call sites have sufficient checks for > that before > >> substituting one for the other. > >> > > > > Good point. As I mentioned in another email, I'll leave the current > > macro untouched. > > The other email I referenced unfortunately didn't make it to the list due email client issues. > I think the chances of us relying on overflow behavior are exceedingly > small, and nearly all uses of DIV_ROUND_UP are in the i965 driver. I > think it's sufficiently safe to go ahead with the patch (but I am > still interested to know about your compiler and CFLAGS). > My error was that I forgot to remove --enable-debug and add -02. After making those changes, my binary size also increases but by ~1.1k. Another implementation which avoids overflow increases the binary size by 2 bytes. Correct me if I'm wrong, but aligning w/ the kernel implementation (or changing the current definition for that matter) seems like a regression more than an improvement. Thanks, Nanley ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/7] nir: Teach nir_opt_algebraic about adding and subtracting the same thing
On Dec 17, 2015 1:21 AM, "Eero Tamminen"wrote: > > Hi, > > > On 12/17/2015 01:52 AM, Matt Turner wrote: >> >> On Tue, Dec 15, 2015 at 1:16 AM, Eduardo Lima Mitev wrote: >>> >>> On 12/15/2015 09:28 AM, Kristian Høgsberg Kristensen wrote: This optimizes a + b - b to just a. Modest shader-db results (BDW): total instructions in shared programs: 7842452 -> 7841862 (-0.01%) instructions in affected programs: 61938 -> 61348 (-0.95%) total loops in shared programs:2131 -> 2131 (0.00%) helped:263 HURT: 0 GAINED:0 LOST: 0 >>> >>> >>> In HSW, I get these shader-db results: >>> >>> total instructions in shared programs: 6257265 -> 6256788 (-0.01%) >>> instructions in affected programs: 46601 -> 46124 (-1.02%) >>> helped: 218 >>> HURT: 0 >>> >>> total cycles in shared programs: 56010026 -> 56007760 (-0.00%) >>> cycles in affected programs: 1048392 -> 1046126 (-0.22%) >>> helped: 199 >>> HURT: 154 >>> >>> total loops in shared programs: 1979 -> 1979 (0.00%) >>> loops in affected programs: 0 -> 0 >>> helped: 0 >>> HURT: 0 >>> >>> LOST: 0 >>> GAINED: 0 >>> >>> I wonder where those cycle HURTs come from. In any case, the net result >>> is positive. >> >> >> I haven't confirmed, but I've seen cases that seem like the cycle >> counts are wrong. > > > I have doubts about the correctness of latency values set in brw_schedule_instructions.cpp. > > They were added mostly by Eric on 2012 & 2013. You added mad & lrp data in 2013 and Curro untyped atomics & surface reads in 2013. Both of them have is_haswell check, but don't say anything about newer generations. > > It seems that some of the values are from spec and some from tests. However, for the test data, the code doesn't say on what exact HW and stepping the tests were run on. Or where the sources for those tests are so that one could try to reproduce the results, verify (with perf counters) that they actually are bound by what the test says, and update data gotten from them for newer generations (i.e. GEN8+). It would be pretty fantastic to get updated or more reliable data. However, I think what we have is probably good enough. The important part is that 3src are slightly more expensive and sends are hugely expensive and I think the current numbers capture that well enough to be moderately useful. As far add register banks go, we've had trouble getting real data or even documentation on that. > In addition to this, Mesa is lacking at least stall cycles for 3src register bank conflicts. > > > - Eero > > PS. cycle values are anyway going to be off, code doesn't know memory latencies as that depends on locality & cache utilization, and it doesn't take threading into account. But it only tries to schedule things so that HW is able to better compensate latency, so it doesn't need to know how much cycles take, just have good enough estimate. :-) For that matter, it can depend on how many vertices you have in the pipe or how big your polygon is. It'll never be perfect; there's only so much you can know from only the shader code. :-). However, a program with a smaller theoretical cycle count will probably execute faster than one with a higher cycle count, so it is useful. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] util/macros: Simplify DIV_ROUND_UP() definition
On Thu, Dec 17, 2015 at 11:53:03AM -0800, Nanley Chery wrote: > On Thu, Dec 17, 2015 at 11:25 AM, Matt Turnerwrote: > > > On Thu, Dec 17, 2015 at 11:04 AM, Nanley Chery > > wrote: > > > On Thu, Dec 17, 2015 at 12:05:46PM +0100, Glenn Kennard wrote: > > >> On Wed, 16 Dec 2015 20:57:51 +0100, Nanley Chery > > wrote: > > >> > > >> >From: Nanley Chery > > >> > > > >> >Commit 64880d073ab21ae1abad0c049ea2d6a1169a3cfa consolidated two > > >> >DIV_ROUND_UP() definitions to one, but chose the more > > >> >compute-intensive version in the process. Use the simpler version > > >> >instead. Reduces .text size by 1360 bytes. > > >> > > > >> >Output of `size lib/i965_dri.so`: > > >> > textdata bss dec hex filename > > >> > 7850440 219264 27240 8096944 7b8cb0 lib/i965_dri.so (before) > > >> > 7849080 219264 27240 8095584 7b8760 lib/i965_dri.so (after) > > >> > > > >> >Cc: Axel Davy > > >> >Signed-off-by: Nanley Chery > > >> >--- > > >> > src/util/macros.h | 2 +- > > >> > 1 file changed, 1 insertion(+), 1 deletion(-) > > >> > > > >> >diff --git a/src/util/macros.h b/src/util/macros.h > > >> >index 0c8958f..53a98a0 100644 > > >> >--- a/src/util/macros.h > > >> >+++ b/src/util/macros.h > > >> >@@ -211,6 +211,6 @@ do { \ > > >> > #endif > > >> >/** Compute ceiling of integer quotient of A divided by B. */ > > >> >-#define DIV_ROUND_UP( A, B ) ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 ) > > >> >+#define DIV_ROUND_UP(A, B) (((A) + (B) - 1) / (B)) > > >> >#endif /* UTIL_MACROS_H */ > > >> > > >> I'll point out that these are not equivalent, one can overflow and the > > other doesn't. You > > >> probably want to check if the call sites have sufficient checks for > > that before > > >> substituting one for the other. > > >> > > > > > > Good point. As I mentioned in another email, I'll leave the current > > > macro untouched. > > > > > The other email I referenced unfortunately didn't make it to the list due > email client > issues. > > > > I think the chances of us relying on overflow behavior are exceedingly > > small, and nearly all uses of DIV_ROUND_UP are in the i965 driver. I > > think it's sufficiently safe to go ahead with the patch (but I am > > still interested to know about your compiler and CFLAGS). > > > > My error was that I forgot to remove --enable-debug and add -02. After > making those > changes, my binary size also increases but by ~1.1k. Another implementation > which > avoids overflow increases the binary size by 2 bytes. > > Correct me if I'm wrong, but aligning w/ the kernel implementation (or > changing the > current definition for that matter) seems like a regression more than an > improvement. Sorry for the previously oddly-wrapped message. I looked into this with Kristian, and the kernel's version does seem to produce better assembly. It's not yet clear why the .text size increases. My current hypothesis is that the number of no-ops increase, but that's yet to be proven. Thanks, Nanley ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Only apply CS stall workaround pre-SKL
On Thursday, December 17, 2015 01:21:26 PM Ben Widawsky wrote: > As per the docs. > > Cc: Kenneth Graunke> Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c > b/src/mesa/drivers/dri/i965/brw_pipe_control.c > index ae3d818..6c636d2 100644 > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c > @@ -97,7 +97,8 @@ void > brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags) > { > if (brw->gen >= 8) { > - gen8_add_cs_stall_workaround_bits(); > + if (brw->gen == 8) > + gen8_add_cs_stall_workaround_bits(); > >BEGIN_BATCH(6); >OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); > @@ -141,7 +142,8 @@ brw_emit_pipe_control_write(struct brw_context *brw, > uint32_t flags, > uint32_t imm_lower, uint32_t imm_upper) > { > if (brw->gen >= 8) { > - gen8_add_cs_stall_workaround_bits(); > + if (brw->gen == 8) > + gen8_add_cs_stall_workaround_bits(); > >BEGIN_BATCH(6); >OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); > "No restrictions." Hard to believe, but that's what it says :) Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev