Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization
On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeirowrote: > vec4 port of fs_cmod_propagation. > > Shader-db results: > total instructions in shared programs: 6241226 -> 6224469 (-0.27%) > instructions in affected programs: 498213 -> 481456 (-3.36%) > helped:3082 > HURT: 0 Would you mind cherry-picking this back onto 4e0a8e0a50c9ac91cb7a70b92b8d9c6fcc02b7aa (the commit right before we made NIR non-optional) and get some GLSL IR vs. NIR vec4-only numbers with this patch? I'd like to know what it does to that delta as well. Thanks! --Jason > --- > > The final outcome is really similar to fs_brw_cmod_propagation. In > fact the only difference is that on fs we have this: > if (scan_inst->overwrites_reg(inst->src[0])) { > if (scan_inst->is_partial_write() || > scan_inst->dst.reg_offset != inst->src[0].reg_offset) >break; > > And on vec4 (this commit) we have this: > if (inst->src[0].in_range(scan_inst->dst, >scan_inst->regs_written)) { > > if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) > || > scan_inst->dst.reg_offset != inst->src[0].reg_offset || > (scan_inst->dst.writemask != WRITEMASK_X && > scan_inst->dst.writemask != WRITEMASK_XYZW)) >break; > > if (scan_inst->dst.writemask == WRITEMASK_XYZW && > inst->src[0].swizzle != BRW_SWIZZLE_XYZW) { >break; > } > > So at some point I thought about refactoring it and having one common, > like with opt_predicated_break, but that one was possible with just > backend_instructions, while here we would need to deal with > vec4_instructions and fs_inst, that could be somewhat messy, so > I'm leaving this as it is. > > src/mesa/drivers/dri/i965/Makefile.sources | 1 + > src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + > src/mesa/drivers/dri/i965/brw_vec4.h | 1 + > .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 > + > 4 files changed, 166 insertions(+) > create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp > > diff --git a/src/mesa/drivers/dri/i965/Makefile.sources > b/src/mesa/drivers/dri/i965/Makefile.sources > index 81ef628..c1836d6 100644 > --- a/src/mesa/drivers/dri/i965/Makefile.sources > +++ b/src/mesa/drivers/dri/i965/Makefile.sources > @@ -56,6 +56,7 @@ i965_compiler_FILES = \ > brw_util.c \ > brw_util.h \ > brw_vec4_builder.h \ > + brw_vec4_cmod_propagation.cpp \ > brw_vec4_copy_propagation.cpp \ > brw_vec4.cpp \ > brw_vec4_cse.cpp \ > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp > b/src/mesa/drivers/dri/i965/brw_vec4.cpp > index e966b96..55e381b 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp > @@ -1867,6 +1867,7 @@ vec4_visitor::run() >OPT(dead_code_eliminate); >OPT(dead_control_flow_eliminate, this); >OPT(opt_copy_propagation); > + OPT(opt_cmod_propagation); >OPT(opt_cse); >OPT(opt_algebraic); >OPT(opt_register_coalesce); > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h > b/src/mesa/drivers/dri/i965/brw_vec4.h > index 5e3500c..3c1711d 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4.h > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h > @@ -149,6 +149,7 @@ public: > int var_range_start(unsigned v, unsigned n) const; > int var_range_end(unsigned v, unsigned n) const; > bool virtual_grf_interferes(int a, int b); > + bool opt_cmod_propagation(); > bool opt_copy_propagation(bool do_constant_prop = true); > bool opt_cse_local(bblock_t *block); > bool opt_cse(); > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp > new file mode 100644 > index 000..7e39d2b > --- /dev/null > +++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp > @@ -0,0 +1,163 @@ > +/* > + * Copyright © 2015 Intel Corporation > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE
[Mesa-dev] [PATCH v2 18/17 (was 10/17)] i965/vs: Move use_legacy_snorm_formula into the shader key
This is really an input into the shader compiler so it kind of makes sense in the key. Also, given where it's placed into the key, it doesn't actually make it any bigger. v2 (Jason Ekstrand): - Rebase on top of the compiler clean-ups so the affects of this patch can better be studied without being in the middle of a series. --- src/mesa/drivers/dri/i965/brw_compiler.h | 3 ++- src/mesa/drivers/dri/i965/brw_vec4.cpp| 4 +--- src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 9 - src/mesa/drivers/dri/i965/brw_vs.c| 3 ++- src/mesa/drivers/dri/i965/brw_vs.h| 5 + 5 files changed, 10 insertions(+), 14 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h b/src/mesa/drivers/dri/i965/brw_compiler.h index 4bc1caa..153e381 100644 --- a/src/mesa/drivers/dri/i965/brw_compiler.h +++ b/src/mesa/drivers/dri/i965/brw_compiler.h @@ -161,6 +161,8 @@ struct brw_vs_prog_key { bool clamp_vertex_color:1; + bool use_legacy_snorm_formula:1; + /** * How many user clipping planes are being uploaded to the vertex shader as * push constants. @@ -585,7 +587,6 @@ brw_compile_vs(const struct brw_compiler *compiler, void *log_data, struct brw_vs_prog_data *prog_data, const struct nir_shader *shader, gl_clip_plane *clip_planes, - bool use_legacy_snorm_formula, int shader_time_index, unsigned *final_assembly_size, char **error_str); diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 8636323..5336590 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1943,7 +1943,6 @@ brw_compile_vs(const struct brw_compiler *compiler, void *log_data, struct brw_vs_prog_data *prog_data, const nir_shader *shader, gl_clip_plane *clip_planes, - bool use_legacy_snorm_formula, int shader_time_index, unsigned *final_assembly_size, char **error_str) @@ -1982,8 +1981,7 @@ brw_compile_vs(const struct brw_compiler *compiler, void *log_data, prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT; vec4_vs_visitor v(compiler, log_data, key, prog_data, -shader, clip_planes, mem_ctx, -shader_time_index, use_legacy_snorm_formula); +shader, clip_planes, mem_ctx, shader_time_index); if (!v.run()) { if (error_str) *error_str = ralloc_strdup(mem_ctx, v.fail_msg); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp index 485a80e..9cf04cd 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp @@ -77,7 +77,8 @@ vec4_vs_visitor::emit_prolog() /* ES 3.0 has different rules for converting signed normalized * fixed-point numbers than desktop GL. */ -if ((wa_flags & BRW_ATTRIB_WA_SIGN) && !use_legacy_snorm_formula) { +if ((wa_flags & BRW_ATTRIB_WA_SIGN) && +!key->use_legacy_snorm_formula) { /* According to equation 2.2 of the ES 3.0 specification, * signed normalization conversion is done by: * @@ -304,14 +305,12 @@ vec4_vs_visitor::vec4_vs_visitor(const struct brw_compiler *compiler, const nir_shader *shader, gl_clip_plane *clip_planes, void *mem_ctx, - int shader_time_index, - bool use_legacy_snorm_formula) + int shader_time_index) : vec4_visitor(compiler, log_data, >tex, _prog_data->base, shader, mem_ctx, false /* no_spills */, shader_time_index), key(key), vs_prog_data(vs_prog_data), - clip_planes(clip_planes), - use_legacy_snorm_formula(use_legacy_snorm_formula) + clip_planes(clip_planes) { } diff --git a/src/mesa/drivers/dri/i965/brw_vs.c b/src/mesa/drivers/dri/i965/brw_vs.c index 9c9b83b..3b3eb8b 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.c +++ b/src/mesa/drivers/dri/i965/brw_vs.c @@ -184,7 +184,6 @@ brw_codegen_vs_prog(struct brw_context *brw, program = brw_compile_vs(brw->intelScreen->compiler, brw, mem_ctx, key, _data, vp->program.Base.nir, brw_select_clip_planes(>ctx), -!_mesa_is_gles3(>ctx), st_index, _size, _str); if (program == NULL) { if (prog) { @@ -341,6 +340,8 @@ brw_vs_populate_key(struct brw_context *brw, key->clamp_vertex_color = ctx->Light._ClampVertexColor; } +
Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.
On 10/10/2015 17:49, Marek Olšák wrote: On Sat, Oct 10, 2015 at 4:15 PM, Bas Nieuwenhuizenwrote: Hi Marek, The revised series is mostly done. I wanted to do more testing and to try to make sure that the added cache flushes I am doing now (a CACHE_FLUSH event before a fast clear and on switching framebuffers) are the minimal needed. Also, it looks like we don't need DCC decompression at all, right? It might be better to get rid of it and only use the 3D engine to access DCC-encoded surfaces. I still use it for flush_resource. I could make this redundant by sharing the DCC buffer by appending the DCC buffer to the texture resource similarly to how the CMASK is appended to the resource of a MSAA buffer. This has the secondary benefit of not needing to reference as many resources for command submission. IIRC, flush_resource is only used for shared (scanout) surfaces where DCC is always disabled. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev I think it's not a very good idea to rely on that. It may be true for now, but may change in the future: For example, perhaps some day wayland will tell egl the app is not fullscreen and that a non-scanoutable buffer can be used. Axel Davy ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.
On Sat, Oct 10, 2015 at 6:12 PM, Axel Davywrote: > On 10/10/2015 17:49, Marek Olšák wrote: >> >> On Sat, Oct 10, 2015 at 4:15 PM, Bas Nieuwenhuizen >> wrote: >>> >>> Hi Marek, >>> >>> The revised series is mostly done. I wanted to do more testing and to >>> try to make sure that the added cache flushes I am doing now (a >>> CACHE_FLUSH event before a fast clear and on switching framebuffers) >>> are the minimal needed. >>> Also, it looks like we don't need DCC decompression at all, right? It might be better to get rid of it and only use the 3D engine to access DCC-encoded surfaces. >>> >>> I still use it for flush_resource. I could make this redundant by >>> sharing the DCC buffer by appending the DCC buffer to the texture >>> resource similarly to how the CMASK is appended to the resource of a >>> MSAA buffer. This has the secondary benefit of not needing to >>> reference as many resources for command submission. >> >> IIRC, flush_resource is only used for shared (scanout) surfaces where >> DCC is always disabled. >> >> Marek >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > I think it's not a very good idea to rely on that. > > It may be true for now, but may change in the future: > For example, perhaps some day wayland will tell egl > the app is not fullscreen and that a non-scanoutable buffer > can be used. If that happens, we'll implement DCC sharing between processes. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/6] glsl: move half<->float convertion to util
On Sat, Oct 10, 2015 at 3:09 PM, Matt Turnerwrote: > On Sat, Oct 10, 2015 at 11:47 AM, Rob Clark wrote: >> From: Rob Clark >> >> Needed in NIR too, so move out of mesa/main/imports.c >> >> Signed-off-by: Rob Clark >> --- >> src/glsl/Makefile.am | 1 + >> src/mesa/main/imports.c | 148 -- >> src/mesa/main/imports.h | 38 -- >> src/util/Makefile.sources | 2 + >> src/util/convert.c| 179 >> ++ >> src/util/convert.h| 43 +++ >> 6 files changed, 259 insertions(+), 152 deletions(-) >> create mode 100644 src/util/convert.c >> create mode 100644 src/util/convert.h >> >> diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am >> index 3265391..347919b 100644 >> --- a/src/glsl/Makefile.am >> +++ b/src/glsl/Makefile.am >> @@ -160,6 +160,7 @@ glsl_compiler_SOURCES = \ >> glsl_compiler_LDADD = \ >> libglsl.la \ >> $(top_builddir)/src/libglsl_util.la \ >> + $(top_builddir)/src/util/libmesautil.la \ >> $(PTHREAD_LIBS) >> >> glsl_test_SOURCES = \ >> diff --git a/src/mesa/main/imports.c b/src/mesa/main/imports.c >> index 350e675..230ebbc 100644 >> --- a/src/mesa/main/imports.c >> +++ b/src/mesa/main/imports.c >> @@ -307,154 +307,6 @@ _mesa_bitcount_64(uint64_t n) >> } >> #endif >> >> - >> -/** >> - * Convert a 4-byte float to a 2-byte half float. >> - * >> - * Not all float32 values can be represented exactly as a float16 value. We >> - * round such intermediate float32 values to the nearest float16. When the >> - * float32 lies exactly between to float16 values, we round to the one with >> - * an even mantissa. >> - * >> - * This rounding behavior has several benefits: >> - * - It has no sign bias. >> - * >> - * - It reproduces the behavior of real hardware: opcode F32TO16 in >> Intel's >> - * GPU ISA. >> - * >> - * - By reproducing the behavior of the GPU (at least on Intel hardware), >> - * compile-time evaluation of constant packHalf2x16 GLSL expressions >> will >> - * result in the same value as if the expression were executed on the >> GPU. >> - */ >> -GLhalfARB >> -_mesa_float_to_half(float val) >> -{ >> - const fi_type fi = {val}; >> - const int flt_m = fi.i & 0x7f; >> - const int flt_e = (fi.i >> 23) & 0xff; >> - const int flt_s = (fi.i >> 31) & 0x1; >> - int s, e, m = 0; >> - GLhalfARB result; >> - >> - /* sign bit */ >> - s = flt_s; >> - >> - /* handle special cases */ >> - if ((flt_e == 0) && (flt_m == 0)) { >> - /* zero */ >> - /* m = 0; - already set */ >> - e = 0; >> - } >> - else if ((flt_e == 0) && (flt_m != 0)) { >> - /* denorm -- denorm float maps to 0 half */ >> - /* m = 0; - already set */ >> - e = 0; >> - } >> - else if ((flt_e == 0xff) && (flt_m == 0)) { >> - /* infinity */ >> - /* m = 0; - already set */ >> - e = 31; >> - } >> - else if ((flt_e == 0xff) && (flt_m != 0)) { >> - /* NaN */ >> - m = 1; >> - e = 31; >> - } >> - else { >> - /* regular number */ >> - const int new_exp = flt_e - 127; >> - if (new_exp < -14) { >> - /* The float32 lies in the range (0.0, min_normal16) and is rounded >> - * to a nearby float16 value. The result will be either zero, >> subnormal, >> - * or normal. >> - */ >> - e = 0; >> - m = _mesa_lroundevenf((1 << 24) * fabsf(fi.f)); >> - } >> - else if (new_exp > 15) { >> - /* map this value to infinity */ >> - /* m = 0; - already set */ >> - e = 31; >> - } >> - else { >> - /* The float32 lies in the range >> - * [min_normal16, max_normal16 + max_step16) >> - * and is rounded to a nearby float16 value. The result will be >> - * either normal or infinite. >> - */ >> - e = new_exp + 15; >> - m = _mesa_lroundevenf(flt_m / (float) (1 << 13)); >> - } >> - } >> - >> - assert(0 <= m && m <= 1024); >> - if (m == 1024) { >> - /* The float32 was rounded upwards into the range of the next >> exponent, >> - * so bump the exponent. This correctly handles the case where f32 >> - * should be rounded up to float16 infinity. >> - */ >> - ++e; >> - m = 0; >> - } >> - >> - result = (s << 15) | (e << 10) | m; >> - return result; >> -} >> - >> - >> -/** >> - * Convert a 2-byte half float to a 4-byte float. >> - * Based on code from: >> - * http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/008786.html >> - */ >> -float >> -_mesa_half_to_float(GLhalfARB val) >> -{ >> - /* XXX could also use a 64K-entry lookup table */ >> - const int m = val & 0x3ff; >> - const int e = (val >> 10) & 0x1f; >> -
Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space
On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoisetwrote: > This patch looks fine except that it should be a bit more normalized. I > mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for > PUSH_SPACE calls, sometimes you add it sometimes not. Meh. We need to get our error checking situation straight, but this isn't the patch to do it in. > > Did you run a full piglit test this time ? :) Nope, but I ran a full piglit before this patch. Almost took down my box. Probably won't be running it again for this patch. > > See my comment below. > > > On 10/10/2015 11:09 AM, Ilia Mirkin wrote: >> >> We still have to push everything out, might as well kick earlier and >> flip pushbufs when we know we'll need it. This resolves some issues with >> the new policy of making sure that we always leave a bit of room at the >> end for fences. >> >> Signed-off-by: Ilia Mirkin >> Cc: mesa-sta...@lists.freedesktop.org >> --- >> src/gallium/drivers/nouveau/nv50/nv50_shader_state.c | 9 ++--- >> src/gallium/drivers/nouveau/nv50/nv50_transfer.c | 16 >> +++- >> src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c | 20 >> +--- >> 3 files changed, 10 insertions(+), 35 deletions(-) >> >> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c >> b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c >> index fdde11f..941555f 100644 >> --- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c >> +++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c >> @@ -65,14 +65,9 @@ nv50_constbufs_validate(struct nv50_context *nv50) >> PUSH_DATA (push, (b << 12) | (i << 8) | p | 1); >> } >> while (words) { >> - unsigned nr; >> - >> - if (!PUSH_SPACE(push, 16)) >> - break; >> - nr = PUSH_AVAIL(push); >> - assert(nr >= 16); >> - nr = MIN2(MIN2(nr - 3, words), NV04_PFIFO_MAX_PACKET_LEN); >> + unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN); >> + PUSH_SPACE(push, nr + 3); > > > This PUSH_SPACE call doesn't seem to be needed for me because > NV50_PUSH_EXPLICIT_SPACE_CHECKING is not set and the following BEGIN_XXX > calls will allocate space. I want to ensure that both of the below commands are in the same batch. Not sure if it's necessary, but... don't want to find out. They were in the same batch before. And this batch stuff is what was causing the M2MF errors I was seeing earlier. > > >> BEGIN_NV04(push, NV50_3D(CB_ADDR), 1); >> PUSH_DATA (push, (start << 8) | b); >> BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nr); >> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c >> b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c >> index be51407..9a3fd1e 100644 >> --- a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c >> +++ b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c >> @@ -187,14 +187,7 @@ nv50_sifc_linear_u8(struct nouveau_context *nv, >> PUSH_DATA (push, 0); >>while (count) { >> - unsigned nr; >> - >> - if (!PUSH_SPACE(push, 16)) >> - break; >> - nr = PUSH_AVAIL(push); >> - assert(nr >= 16); >> - nr = MIN2(count, nr - 1); >> - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN); >> + unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN); >> BEGIN_NI04(push, NV50_2D(SIFC_DATA), nr); >> PUSH_DATAp(push, src, nr); >> @@ -395,12 +388,9 @@ nv50_cb_push(struct nouveau_context *nv, >> nouveau_pushbuf_validate(push); >>while (words) { >> - unsigned nr; >> - >> - nr = PUSH_AVAIL(push); >> - nr = MIN2(nr - 7, words); >> - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1); >> + unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN); >> + PUSH_SPACE(push, nr + 7); >> BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3); >> PUSH_DATAh(push, bo->offset + base); >> PUSH_DATA (push, bo->offset + base); >> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c >> b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c >> index aaec60a..d459dd6 100644 >> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c >> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c >> @@ -188,14 +188,10 @@ nvc0_m2mf_push_linear(struct nouveau_context *nv, >> nouveau_pushbuf_validate(push); >>while (count) { >> - unsigned nr; >> + unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN); >> - if (!PUSH_SPACE(push, 16)) >> + if (!PUSH_SPACE(push, nr + 9)) >>break; >> - nr = PUSH_AVAIL(push); >> - assert(nr >= 16); >> - nr = MIN2(count, nr - 9); >> - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN); >> BEGIN_NVC0(push, NVC0_M2MF(OFFSET_OUT_HIGH), 2); >> PUSH_DATAh(push, dst->offset + offset); >> @@ -234,14 +230,10 @@
Re: [Mesa-dev] [PATCH 1/5] i965/vec4: nir_emit_if doesn't need to predicate based on all the channels
Looking at the docs a bit, it looks like we should never have been using predicate_normal for if's in the first place Reviewed-by: Jason EkstrandOn Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro wrote: > --- > > I already talked about this with Jason Ekstrand and Matt Turner > privately, but just in case somebody else jump to the review: > > When using BRW_PREDICATE_NORMAL, the if will use all the channels of > the register flag. But nir_if only reads from one channel, so that > is not needed. Another hint showing that this is safe: the MOV that > put the condition on f0 is calling get_nir_src with just one > component. That will return always a source with swizzle > BRW_SWIZZLE_, so that component is the only to be used. > > This commit is not needed/solving anything per-se, but it is needed in > order to be able to implement vec4_cmod_propagation with a good > overall outcome. > > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > index 41bd80d..e05745f 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > @@ -193,7 +193,9 @@ vec4_visitor::nir_emit_if(nir_if *if_stmt) > vec4_instruction *inst = emit(MOV(dst_null_d(), condition)); > inst->conditional_mod = BRW_CONDITIONAL_NZ; > > - emit(IF(BRW_PREDICATE_NORMAL)); > + /* We can just predicate based on the X channel, as the condition only > +* reads from one channel */ > + emit(IF(BRW_PREDICATE_ALIGN16_REPLICATE_X)); > > nir_emit_cf_list(_stmt->then_list); > > -- > 2.1.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.
On Sat, Oct 10, 2015 at 5:49 PM, Marek Olšákwrote: > On Sat, Oct 10, 2015 at 4:15 PM, Bas Nieuwenhuizen > wrote: >> Hi Marek, >> >> The revised series is mostly done. I wanted to do more testing and to >> try to make sure that the added cache flushes I am doing now (a >> CACHE_FLUSH event before a fast clear and on switching framebuffers) >> are the minimal needed. >> >>> Also, it looks like we don't need DCC decompression at all, right? It >>> might be better to get rid of it and only use the 3D engine to access >>> DCC-encoded surfaces. >> >> I still use it for flush_resource. I could make this redundant by >> sharing the DCC buffer by appending the DCC buffer to the texture >> resource similarly to how the CMASK is appended to the resource of a >> MSAA buffer. This has the secondary benefit of not needing to >> reference as many resources for command submission. > > IIRC, flush_resource is only used for shared (scanout) surfaces where > DCC is always disabled. That said, we might need to keep the DCC decompression for image store instructions, which don't support compression. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Mesa 11.0.3
Mesa 11.0.3 is now available. In the current release we have a bunch of EGL patches, mangledGL build fixes and a healthy amount of driver bugfixes - radeonsi, nouveau, i915 and i965. Last but not least, the KDE/Weston regression introduced with 11.0.2 has also been resolved. Brian Paul (1): st/mesa: try PIPE_BIND_RENDER_TARGET when choosing float texture formats Daniel Scharrer (1): mesa: Add abs input modifier to base for POW in ffvertex_prog Emil Velikov (4): docs: add sha256 checksums for 11.0.2 Revert "nouveau: make sure there's always room to emit a fence" Update version to 11.0.3 docs: add release notes for 11.0.3 Francisco Jerez (1): i965/fs: Fix hang on IVB and VLV with image format mismatch. Ian Romanick (1): meta: Handle array textures in scaled MSAA blits Ilia Mirkin (6): nouveau: be more careful about freeing temporary transfer buffers nouveau: delay deleting buffer with unflushed fence nouveau: wait to unref the transfer's bo until it's no longer used nv30: pretend to have packed texture/surface formats nv30: always go through translate module on big-endian nouveau: make sure there's always room to emit a fence Jason Ekstrand (1): mesa: Correctly handle GL_BGRA_EXT in ES3 format_and_type checks Kyle Brenneman (3): glx: Fix build errors with --enable-mangling (v2) mapi: Make _glapi_get_stub work with "gl" or "mgl" prefix. glx: Don't hard-code the name "libGL.so.1" in driOpenDriver (v3) Leo Liu (1): radeon/vce: fix vui time_scale zero error Marek Olšák (21): st/mesa: fix front buffer regression after dropping st_validate_state in Blit radeonsi: handle index buffer alloc failures radeonsi: handle constant buffer alloc failures gallium/radeon: handle buffer_map staging buffer failures better gallium/radeon: handle buffer alloc failures in r600_draw_rectangle gallium/radeon: add a fail path for depth MSAA texture readback radeonsi: report alloc failure from si_shader_binary_read radeonsi: add malloc fail paths to si_create_shader_state radeonsi: skip drawing if the tess factor ring allocation fails radeonsi: skip drawing if GS ring allocations fail radeonsi: handle shader precompile failures radeonsi: handle fixed-func TCS shader create failure radeonsi: skip drawing if VS, TCS, TES, GS fail to compile or upload radeonsi: skip drawing if PS fails to compile or upload radeonsi: skip drawing if updating the scratch buffer fails radeonsi: don't forget to update scratch relocations for LS, HS, ES shaders radeonsi: handle dummy constant buffer allocation failure gallium/u_blitter: handle allocation failures radeonsi: add scratch buffer to the buffer list when it's re-allocated st/dri: don't use _ctx in client_wait_sync egl/dri2: don't require a context for ClientWaitSync (v2) Matthew Waters (1): egl: rework handling EGL_CONTEXT_FLAGS Michel Dänzer (1): st/dri: Use packed RGB formats Roland Scheidegger (1): mesa: fix mipmap generation for immutable, compressed textures Tom Stellard (3): gallium/radeon: Use call_once() when initailizing LLVM targets gallivm: Allow drivers and state trackers to initialize gallivm LLVM targets v2 radeon/llvm: Initialize gallivm targets when initializing the AMDGPU target v2 Varad Gautam (1): egl: restore surface type before linking config to its display Ville Syrjälä (3): i830: Fix collision between I830_UPLOAD_RASTER_RULES and I830_UPLOAD_TEX(0) i915: Fix texcoord vs. varying collision in fragment programs i915: Remember to call intel_prepare_render() before blitting git tag: mesa-11.0.3 ftp://ftp.freedesktop.org/pub/mesa/11.0.3/mesa-11.0.3.tar.gz MD5: 67be040a22025034351ca26c204db81c mesa-11.0.3.tar.gz SHA1: 85f5386a9914cfbf53dae58b39e26b2e41f66178 mesa-11.0.3.tar.gz SHA256: c2210e3daecc10ed9fdcea500327652ed6effc2f47c4b9cee63fb08f560d7117 mesa-11.0.3.tar.gz PGP: ftp://ftp.freedesktop.org/pub/mesa/11.0.3/mesa-11.0.3.tar.gz.sig ftp://ftp.freedesktop.org/pub/mesa/11.0.3/mesa-11.0.3.tar.xz MD5: bf9118bf0fbf360715cfe60baf7a1db5 mesa-11.0.3.tar.xz SHA1: e66dbd0f372947eaaee12a50df41befb20164b05 mesa-11.0.3.tar.xz SHA256: ab2992eece21adc23c398720ef8c6933cb69ea42e1b2611dc09d031e17e033d6 mesa-11.0.3.tar.xz PGP: ftp://ftp.freedesktop.org/pub/mesa/11.0.3/mesa-11.0.3.tar.xz.sig -- -Emil signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 92361] [BSW SKL] Regression: glx@glx-copy-sub-buffer failed
https://bugs.freedesktop.org/show_bug.cgi?id=92361 cprigentchanged: What|Removed |Added Summary|[BSW] Regression: |[BSW SKL] Regression: |glx@glx-copy-sub-buffer |glx@glx-copy-sub-buffer |failed |failed --- Comment #1 from cprigent --- Reproduced on SKL. Following tests were Pass with Mesa 10.6.7: glx@glx-copy-sub-buffer glx@glx-copy-sub-buffer samples=2 glx@glx-copy-sub-buffer samples=4 glx@glx-copy-sub-buffer samples=6 glx@glx-copy-sub-buffer samples=8 Hardware: Platform: SKY LAKE Y A0 CPU : Intel(R) Core(TM) m5-6Y57 CPU @ 1.10GHz (family: 6, model: 78 stepping: 3) MCP : SKL-Y D1 2+2 (ou ULX-D1) QDF : QJK9 CPU : SKL D0 Chipset PCH: Sunrise Point LP C1 CRB : SKY LAKE Y LPDDR3 RVP3 CRB FAB2 Reworks : All Mandatories + FBS02,FBS03, F23, O-02 & O-06 Software Linux : Ubuntu 14.04 LTS 64 bits BIOS : SKLSE2R1.R00.X097.B02.1509020030 ME FW : 11.0.0.1173 Ksc (EC FW): 1.19 kernel 4.3.0-rc3-drm-intel-nightly+ (eb69e51) from git://anongit.freedesktop.org/drm-intel Mesa - 11.0.2 from http://cgit.freedesktop.org/mesa/mesa/ xf86-video-intel - 2.99.917 from http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/ Libdrm - 2.4.65 from http://cgit.freedesktop.org/mesa/drm/ Libva - 1.6.1 from http://cgit.freedesktop.org/libva/ vaapi intel-driver - 1.6.1 from http://cgit.freedesktop.org/vaapi/intel-driver Cairo - 1.14.2 from http://cgit.freedesktop.org/cairo Xorg Xserver - 1.17.2 from http://cgit.freedesktop.org/xorg/xserver -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 92368] [BSW] Regression: glx@glx_arb_sync_control@timing -fullscreen test Fail
https://bugs.freedesktop.org/show_bug.cgi?id=92368 --- Comment #1 from cprigent--- Reproduced on SKL-Y. Following tests were Pass with Mesa 10.6.4: glx@glx_arb_sync_control@timing -divisor 1 glx@glx_arb_sync_control@timing -fullscreen -msc-delta 1 glx@glx_arb_sync_control@timing -msc-delta 1 glx@glx_arb_sync_control@timing -msc-delta 2 glx@glx_arb_sync_control@timing -waitformsc -divisor 1 glx@glx_arb_sync_control@timing -waitformsc -msc-delta 2 Hardware: Platform: SKY LAKE Y A0 CPU : Intel(R) Core(TM) m5-6Y57 CPU @ 1.10GHz (family: 6, model: 78 stepping: 3) MCP : SKL-Y D1 2+2 (ou ULX-D1) QDF : QJK9 CPU : SKL D0 Chipset PCH: Sunrise Point LP C1 CRB : SKY LAKE Y LPDDR3 RVP3 CRB FAB2 Reworks : All Mandatories + FBS02,FBS03, F23, O-02 & O-06 Software Linux : Ubuntu 14.04 LTS 64 bits BIOS : SKLSE2R1.R00.X097.B02.1509020030 ME FW : 11.0.0.1173 Ksc (EC FW): 1.19 kernel 4.3.0-rc3-drm-intel-nightly+ (eb69e51) from git://anongit.freedesktop.org/drm-intel Mesa - 11.0.2 from http://cgit.freedesktop.org/mesa/mesa/ xf86-video-intel - 2.99.917 from http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/ Libdrm - 2.4.65 from http://cgit.freedesktop.org/mesa/drm/ Libva - 1.6.1 from http://cgit.freedesktop.org/libva/ vaapi intel-driver - 1.6.1 from http://cgit.freedesktop.org/vaapi/intel-driver Cairo - 1.14.2 from http://cgit.freedesktop.org/cairo Xorg Xserver - 1.17.2 from http://cgit.freedesktop.org/xorg/xserver -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/vec4: Implement b2f and b2i using negation.
Curro added this in commit 3ee2daf23d (before the vec4/NIR backend was added) but it was missed in the new NIR backend. Add it there as well. instructions in affected programs: 1857 -> 1810 (-2.53%) helped:15 --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 41bd80d..fdf767d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1237,14 +1237,8 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) break; case nir_op_b2i: - emit(AND(dst, op[0], src_reg(1))); - break; - case nir_op_b2f: - op[0].type = BRW_REGISTER_TYPE_D; - dst.type = BRW_REGISTER_TYPE_D; - emit(AND(dst, op[0], src_reg(0x3f80u))); - dst.type = BRW_REGISTER_TYPE_F; + emit(MOV(dst, negate(op[0]))); break; case nir_op_f2b: -- 2.4.9 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space
On Sat, Oct 10, 2015 at 3:55 PM, Samuel Pitoisetwrote: > > > On 10/10/2015 09:42 PM, Ilia Mirkin wrote: >> >> On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset >> wrote: >>> >>> This patch looks fine except that it should be a bit more normalized. I >>> mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for >>> PUSH_SPACE calls, sometimes you add it sometimes not. >> >> Meh. We need to get our error checking situation straight, but this >> isn't the patch to do it in. > > > Yeah, but this needs to be clarified. What does? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.
Hi Marek, The revised series is mostly done. I wanted to do more testing and to try to make sure that the added cache flushes I am doing now (a CACHE_FLUSH event before a fast clear and on switching framebuffers) are the minimal needed. > Also, it looks like we don't need DCC decompression at all, right? It > might be better to get rid of it and only use the 3D engine to access > DCC-encoded surfaces. I still use it for flush_resource. I could make this redundant by sharing the DCC buffer by appending the DCC buffer to the texture resource similarly to how the CMASK is appended to the resource of a MSAA buffer. This has the secondary benefit of not needing to reference as many resources for command submission. Yours sincerely, Bas Nieuwenhuizen ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir/glsl: Use shader_prog->Name for naming the NIR shader
This has the better name to use. Aparently, sh->Name is usually 0. --- src/glsl/nir/glsl_to_nir.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp index 6e1dd84..3284bdc 100644 --- a/src/glsl/nir/glsl_to_nir.cpp +++ b/src/glsl/nir/glsl_to_nir.cpp @@ -150,7 +150,7 @@ glsl_to_nir(const struct gl_shader_program *shader_prog, if (sh->Program->SamplersUsed & (1 << i)) num_textures = i; - shader->info.name = ralloc_asprintf(shader, "GLSL%d", sh->Name); + shader->info.name = ralloc_asprintf(shader, "GLSL%d", shader_prog->Name); if (shader_prog->Label) shader->info.label = ralloc_strdup(shader, shader_prog->Label); shader->info.num_textures = num_textures; -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler
I reworked this patch to patch use_legacy_snorm_formula through as a function argument rather than trying to go through the key. This should make landing this series independent of finding strange meta-related gpu hangs on HSW. I reworked the patch to move use_legacy_snorm_formula into the key so that it applies on top of the series. On Sat, Oct 10, 2015 at 8:09 AM, Jason Ekstrandwrote: > This commit removes all dependence on GL state by getting rid of the > brw_context parameter and the GL data structures. > > v2 (Jason Ekstrand): >- Patch use_legacy_snorm_formula through as a function argument rather > than trying to go through the shader key. > --- > src/mesa/drivers/dri/i965/brw_vec4.cpp | 70 > +- > src/mesa/drivers/dri/i965/brw_vs.c | 16 +++- > src/mesa/drivers/dri/i965/brw_vs.h | 12 -- > 3 files changed, 49 insertions(+), 49 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp > b/src/mesa/drivers/dri/i965/brw_vec4.cpp > index 4b8390f..8e38729 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp > @@ -1937,51 +1937,42 @@ extern "C" { > * Returns the final assembly and the program's size. > */ > const unsigned * > -brw_vs_emit(struct brw_context *brw, > +brw_vs_emit(const struct brw_compiler *compiler, void *log_data, > void *mem_ctx, > const struct brw_vs_prog_key *key, > struct brw_vs_prog_data *prog_data, > -struct gl_vertex_program *vp, > -struct gl_shader_program *prog, > +const nir_shader *shader, > +gl_clip_plane *clip_planes, > +bool use_legacy_snorm_formula, > int shader_time_index, > -unsigned *final_assembly_size) > +unsigned *final_assembly_size, > +char **error_str) > { > const unsigned *assembly = NULL; > > - if (brw->intelScreen->compiler->scalar_vs) { > + if (compiler->scalar_vs) { >prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8; > > - fs_visitor v(brw->intelScreen->compiler, brw, > - mem_ctx, key, _data->base.base, > + fs_visitor v(compiler, log_data, mem_ctx, key, _data->base.base, > NULL, /* prog; Only used for TEXTURE_RECTANGLE on gen < 8 > */ > - vp->Base.nir, 8, shader_time_index); > - if (!v.run_vs(brw_select_clip_planes(>ctx))) { > - if (prog) { > -prog->LinkStatus = false; > -ralloc_strcat(>InfoLog, v.fail_msg); > - } > - > - _mesa_problem(NULL, "Failed to compile vertex shader: %s\n", > - v.fail_msg); > + shader, 8, shader_time_index); > + if (!v.run_vs(clip_planes)) { > + if (error_str) > +*error_str = ralloc_strdup(mem_ctx, v.fail_msg); > > return NULL; >} > > - fs_generator g(brw->intelScreen->compiler, brw, > - mem_ctx, (void *) key, _data->base.base, > - v.promoted_constants, > + fs_generator g(compiler, log_data, mem_ctx, (void *) key, > + _data->base.base, v.promoted_constants, > v.runtime_check_aads_emit, "VS"); >if (INTEL_DEBUG & DEBUG_VS) { > - char *name; > - if (prog) { > -name = ralloc_asprintf(mem_ctx, "%s vertex shader %d", > - prog->Label ? prog->Label : "unnamed", > - prog->Name); > - } else { > -name = ralloc_asprintf(mem_ctx, "vertex program %d", > - vp->Base.Id); > - } > - g.enable_debug(name); > + const char *debug_name = > +ralloc_asprintf(mem_ctx, "%s vertex shader %s", > +shader->info.label ? shader->info.label : > "unnamed", > +shader->info.name); > + > + g.enable_debug(debug_name); >} >g.generate_code(v.cfg, 8); >assembly = g.get_assembly(final_assembly_size); > @@ -1990,26 +1981,19 @@ brw_vs_emit(struct brw_context *brw, > if (!assembly) { >prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT; > > - vec4_vs_visitor v(brw->intelScreen->compiler, brw, key, prog_data, > -vp->Base.nir, brw_select_clip_planes(>ctx), > -mem_ctx, shader_time_index, > -!_mesa_is_gles3(>ctx)); > + vec4_vs_visitor v(compiler, log_data, key, prog_data, > +shader, clip_planes, mem_ctx, > +shader_time_index, use_legacy_snorm_formula); >if (!v.run()) { > - if (prog) { > -prog->LinkStatus = false; > -ralloc_strcat(>InfoLog, v.fail_msg); > - } > - > - _mesa_problem(NULL, "Failed to
Re: [Mesa-dev] [PATCH] nouveau: avoid emitting new fences unnecessarily
Does this fix those texelFetch piglit tests ? Or is it the second patch ? Anyway, this patch is : Reviewed-by: Samuel PitoisetOn 10/10/2015 08:12 AM, Ilia Mirkin wrote: Right now we emit on every kick, but this is only necessary if something will ever be able to observe that the fence completed. If there are no refs, leave the fence alone and emit it another day. This also happens to work around an issue for the kick handler -- a kick can be a result of e.g. nouveau_bo_wait or explicit kick, or it can be due to lack of space in the pushbuf. We want the emit to happen in the current batch, so we want there to always be enough space. However an explicit kick could take the reserved space for the implicitly-triggered kick's fence emission if it happened right after. With the new mechanism, hopefully there's no way to cause two fences to be emitted into the same reserved space. Signed-off-by: Ilia Mirkin Cc: mesa-sta...@lists.freedesktop.org Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence) --- src/gallium/drivers/nouveau/nouveau_fence.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/nouveau_fence.c b/src/gallium/drivers/nouveau/nouveau_fence.c index ee4e08d..18b1592 100644 --- a/src/gallium/drivers/nouveau/nouveau_fence.c +++ b/src/gallium/drivers/nouveau/nouveau_fence.c @@ -190,8 +190,10 @@ nouveau_fence_wait(struct nouveau_fence *fence) /* wtf, someone is waiting on a fence in flush_notify handler? */ assert(fence->state != NOUVEAU_FENCE_STATE_EMITTING); - if (fence->state < NOUVEAU_FENCE_STATE_EMITTED) + if (fence->state < NOUVEAU_FENCE_STATE_EMITTED) { + PUSH_SPACE(screen->pushbuf, 8); nouveau_fence_emit(fence); + } if (fence->state < NOUVEAU_FENCE_STATE_FLUSHED) if (nouveau_pushbuf_kick(screen->pushbuf, screen->pushbuf->channel)) @@ -224,8 +226,12 @@ nouveau_fence_wait(struct nouveau_fence *fence) void nouveau_fence_next(struct nouveau_screen *screen) { - if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING) - nouveau_fence_emit(screen->fence.current); + if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING) { + if (screen->fence.current->ref > 1) + nouveau_fence_emit(screen->fence.current); + else + return; + } nouveau_fence_ref(NULL, >fence.current); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space
This patch looks fine except that it should be a bit more normalized. I mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for PUSH_SPACE calls, sometimes you add it sometimes not. Did you run a full piglit test this time ? :) See my comment below. On 10/10/2015 11:09 AM, Ilia Mirkin wrote: We still have to push everything out, might as well kick earlier and flip pushbufs when we know we'll need it. This resolves some issues with the new policy of making sure that we always leave a bit of room at the end for fences. Signed-off-by: Ilia MirkinCc: mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/nouveau/nv50/nv50_shader_state.c | 9 ++--- src/gallium/drivers/nouveau/nv50/nv50_transfer.c | 16 +++- src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c | 20 +--- 3 files changed, 10 insertions(+), 35 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c index fdde11f..941555f 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c @@ -65,14 +65,9 @@ nv50_constbufs_validate(struct nv50_context *nv50) PUSH_DATA (push, (b << 12) | (i << 8) | p | 1); } while (words) { - unsigned nr; - - if (!PUSH_SPACE(push, 16)) - break; - nr = PUSH_AVAIL(push); - assert(nr >= 16); - nr = MIN2(MIN2(nr - 3, words), NV04_PFIFO_MAX_PACKET_LEN); + unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN); + PUSH_SPACE(push, nr + 3); This PUSH_SPACE call doesn't seem to be needed for me because NV50_PUSH_EXPLICIT_SPACE_CHECKING is not set and the following BEGIN_XXX calls will allocate space. BEGIN_NV04(push, NV50_3D(CB_ADDR), 1); PUSH_DATA (push, (start << 8) | b); BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nr); diff --git a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c index be51407..9a3fd1e 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c @@ -187,14 +187,7 @@ nv50_sifc_linear_u8(struct nouveau_context *nv, PUSH_DATA (push, 0); while (count) { - unsigned nr; - - if (!PUSH_SPACE(push, 16)) - break; - nr = PUSH_AVAIL(push); - assert(nr >= 16); - nr = MIN2(count, nr - 1); - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN); + unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN); BEGIN_NI04(push, NV50_2D(SIFC_DATA), nr); PUSH_DATAp(push, src, nr); @@ -395,12 +388,9 @@ nv50_cb_push(struct nouveau_context *nv, nouveau_pushbuf_validate(push); while (words) { - unsigned nr; - - nr = PUSH_AVAIL(push); - nr = MIN2(nr - 7, words); - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1); + unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN); + PUSH_SPACE(push, nr + 7); BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3); PUSH_DATAh(push, bo->offset + base); PUSH_DATA (push, bo->offset + base); diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c index aaec60a..d459dd6 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c @@ -188,14 +188,10 @@ nvc0_m2mf_push_linear(struct nouveau_context *nv, nouveau_pushbuf_validate(push); while (count) { - unsigned nr; + unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN); - if (!PUSH_SPACE(push, 16)) + if (!PUSH_SPACE(push, nr + 9)) break; - nr = PUSH_AVAIL(push); - assert(nr >= 16); - nr = MIN2(count, nr - 9); - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN); BEGIN_NVC0(push, NVC0_M2MF(OFFSET_OUT_HIGH), 2); PUSH_DATAh(push, dst->offset + offset); @@ -234,14 +230,10 @@ nve4_p2mf_push_linear(struct nouveau_context *nv, nouveau_pushbuf_validate(push); while (count) { - unsigned nr; + unsigned nr = MIN2(count, (NV04_PFIFO_MAX_PACKET_LEN - 1)); - if (!PUSH_SPACE(push, 16)) + if (!PUSH_SPACE(push, nr + 10)) break; - nr = PUSH_AVAIL(push); - assert(nr >= 16); - nr = MIN2(count, nr - 8); - nr = MIN2(nr, (NV04_PFIFO_MAX_PACKET_LEN - 1)); BEGIN_NVC0(push, NVE4_P2MF(UPLOAD_DST_ADDRESS_HIGH), 2); PUSH_DATAh(push, dst->offset + offset); @@ -571,9 +563,7 @@ nvc0_cb_bo_push(struct nouveau_context *nv, PUSH_DATA (push, bo->offset + base); while (words) { - unsigned nr = PUSH_AVAIL(push); - nr = MIN2(nr, words); - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1); + unsigned nr = MIN2(words,
Re: [Mesa-dev] [PATCH 11/17] i965/fs: Rework wm_fs_emit to take a nir_shader and a brw_compiler
Ignore this. It's just an accidental re-send. On Sat, Oct 10, 2015 at 8:04 AM, Jason Ekstrandwrote: > This commit removes all dependence on GL state by getting rid of the > brw_context parameter and the GL data structures. > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 59 > > src/mesa/drivers/dri/i965/brw_wm.c | 14 +++-- > src/mesa/drivers/dri/i965/brw_wm.h | 13 +--- > 3 files changed, 47 insertions(+), 39 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index 3c83f2a..8bdc676 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -5115,40 +5115,39 @@ fs_visitor::run_cs() > } > > const unsigned * > -brw_wm_fs_emit(struct brw_context *brw, > +brw_wm_fs_emit(const struct brw_compiler *compiler, void *log_data, > void *mem_ctx, > const struct brw_wm_prog_key *key, > struct brw_wm_prog_data *prog_data, > - struct gl_fragment_program *fp, > - struct gl_shader_program *prog, > + const nir_shader *shader, > + struct gl_program *prog, > int shader_time_index8, int shader_time_index16, > - unsigned *final_assembly_size) > + bool use_rep_send, > + unsigned *final_assembly_size, > + char **error_str) > { > - /* Now the main event: Visit the shader IR and generate our FS IR for it. > -*/ > - fs_visitor v(brw->intelScreen->compiler, brw, mem_ctx, key, > -_data->base, >Base, fp->Base.nir, 8, > shader_time_index8); > + fs_visitor v(compiler, log_data, mem_ctx, key, > +_data->base, prog, shader, 8, > +shader_time_index8); > if (!v.run_fs(false /* do_rep_send */)) { > - if (prog) { > - prog->LinkStatus = false; > - ralloc_strcat(>InfoLog, v.fail_msg); > - } > - > - _mesa_problem(NULL, "Failed to compile fragment shader: %s\n", > -v.fail_msg); > + if (error_str) > + *error_str = ralloc_strdup(mem_ctx, v.fail_msg); > >return NULL; > } > > cfg_t *simd16_cfg = NULL; > - fs_visitor v2(brw->intelScreen->compiler, brw, mem_ctx, key, > - _data->base, >Base, fp->Base.nir, 16, > shader_time_index16); > - if (likely(!(INTEL_DEBUG & DEBUG_NO16) || brw->use_rep_send)) { > + fs_visitor v2(compiler, log_data, mem_ctx, key, > + _data->base, prog, shader, 16, > + shader_time_index16); > + if (likely(!(INTEL_DEBUG & DEBUG_NO16) || use_rep_send)) { >if (!v.simd16_unsupported) { > /* Try a SIMD16 compile */ > v2.import_uniforms(); > - if (!v2.run_fs(brw->use_rep_send)) { > -perf_debug("SIMD16 shader failed to compile: %s", v2.fail_msg); > + if (!v2.run_fs(use_rep_send)) { > +compiler->shader_perf_log(log_data, > + "SIMD16 shader failed to compile: %s", > + v2.fail_msg); > } else { > simd16_cfg = v2.cfg; > } > @@ -5156,8 +5155,8 @@ brw_wm_fs_emit(struct brw_context *brw, > } > > cfg_t *simd8_cfg; > - int no_simd8 = (INTEL_DEBUG & DEBUG_NO8) || brw->no_simd8; > - if ((no_simd8 || brw->gen < 5) && simd16_cfg) { > + int no_simd8 = (INTEL_DEBUG & DEBUG_NO8) || use_rep_send; > + if ((no_simd8 || compiler->devinfo->gen < 5) && simd16_cfg) { >simd8_cfg = NULL; >prog_data->no_8 = true; > } else { > @@ -5165,20 +5164,14 @@ brw_wm_fs_emit(struct brw_context *brw, >prog_data->no_8 = false; > } > > - fs_generator g(brw->intelScreen->compiler, brw, > - mem_ctx, (void *) key, _data->base, > + fs_generator g(compiler, log_data, mem_ctx, (void *) key, > _data->base, >v.promoted_constants, v.runtime_check_aads_emit, "FS"); > > if (unlikely(INTEL_DEBUG & DEBUG_WM)) { > - char *name; > - if (prog) > - name = ralloc_asprintf(mem_ctx, "%s fragment shader %d", > -prog->Label ? prog->Label : "unnamed", > -prog->Name); > - else > - name = ralloc_asprintf(mem_ctx, "fragment program %d", fp->Base.Id); > - > - g.enable_debug(name); > + g.enable_debug(ralloc_asprintf(mem_ctx, "%s fragment shader %s", > + shader->info.label ? shader->info.label > : > + "unnamed", > + shader->info.name)); > } > > if (simd8_cfg) > diff --git a/src/mesa/drivers/dri/i965/brw_wm.c > b/src/mesa/drivers/dri/i965/brw_wm.c > index 4d5e7f6..ab9461a 100644 > --- a/src/mesa/drivers/dri/i965/brw_wm.c > +++
[Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler
This commit removes all dependence on GL state by getting rid of the brw_context parameter and the GL data structures. v2 (Jason Ekstrand): - Patch use_legacy_snorm_formula through as a function argument rather than trying to go through the shader key. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 70 +- src/mesa/drivers/dri/i965/brw_vs.c | 16 +++- src/mesa/drivers/dri/i965/brw_vs.h | 12 -- 3 files changed, 49 insertions(+), 49 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 4b8390f..8e38729 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1937,51 +1937,42 @@ extern "C" { * Returns the final assembly and the program's size. */ const unsigned * -brw_vs_emit(struct brw_context *brw, +brw_vs_emit(const struct brw_compiler *compiler, void *log_data, void *mem_ctx, const struct brw_vs_prog_key *key, struct brw_vs_prog_data *prog_data, -struct gl_vertex_program *vp, -struct gl_shader_program *prog, +const nir_shader *shader, +gl_clip_plane *clip_planes, +bool use_legacy_snorm_formula, int shader_time_index, -unsigned *final_assembly_size) +unsigned *final_assembly_size, +char **error_str) { const unsigned *assembly = NULL; - if (brw->intelScreen->compiler->scalar_vs) { + if (compiler->scalar_vs) { prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8; - fs_visitor v(brw->intelScreen->compiler, brw, - mem_ctx, key, _data->base.base, + fs_visitor v(compiler, log_data, mem_ctx, key, _data->base.base, NULL, /* prog; Only used for TEXTURE_RECTANGLE on gen < 8 */ - vp->Base.nir, 8, shader_time_index); - if (!v.run_vs(brw_select_clip_planes(>ctx))) { - if (prog) { -prog->LinkStatus = false; -ralloc_strcat(>InfoLog, v.fail_msg); - } - - _mesa_problem(NULL, "Failed to compile vertex shader: %s\n", - v.fail_msg); + shader, 8, shader_time_index); + if (!v.run_vs(clip_planes)) { + if (error_str) +*error_str = ralloc_strdup(mem_ctx, v.fail_msg); return NULL; } - fs_generator g(brw->intelScreen->compiler, brw, - mem_ctx, (void *) key, _data->base.base, - v.promoted_constants, + fs_generator g(compiler, log_data, mem_ctx, (void *) key, + _data->base.base, v.promoted_constants, v.runtime_check_aads_emit, "VS"); if (INTEL_DEBUG & DEBUG_VS) { - char *name; - if (prog) { -name = ralloc_asprintf(mem_ctx, "%s vertex shader %d", - prog->Label ? prog->Label : "unnamed", - prog->Name); - } else { -name = ralloc_asprintf(mem_ctx, "vertex program %d", - vp->Base.Id); - } - g.enable_debug(name); + const char *debug_name = +ralloc_asprintf(mem_ctx, "%s vertex shader %s", +shader->info.label ? shader->info.label : "unnamed", +shader->info.name); + + g.enable_debug(debug_name); } g.generate_code(v.cfg, 8); assembly = g.get_assembly(final_assembly_size); @@ -1990,26 +1981,19 @@ brw_vs_emit(struct brw_context *brw, if (!assembly) { prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT; - vec4_vs_visitor v(brw->intelScreen->compiler, brw, key, prog_data, -vp->Base.nir, brw_select_clip_planes(>ctx), -mem_ctx, shader_time_index, -!_mesa_is_gles3(>ctx)); + vec4_vs_visitor v(compiler, log_data, key, prog_data, +shader, clip_planes, mem_ctx, +shader_time_index, use_legacy_snorm_formula); if (!v.run()) { - if (prog) { -prog->LinkStatus = false; -ralloc_strcat(>InfoLog, v.fail_msg); - } - - _mesa_problem(NULL, "Failed to compile vertex shader: %s\n", - v.fail_msg); + if (error_str) +*error_str = ralloc_strdup(mem_ctx, v.fail_msg); return NULL; } - vec4_generator g(brw->intelScreen->compiler, brw, - _data->base, + vec4_generator g(compiler, log_data, _data->base, mem_ctx, INTEL_DEBUG & DEBUG_VS, "vertex", "VS"); - assembly = g.generate_assembly(v.cfg, final_assembly_size, vp->Base.nir); + assembly = g.generate_assembly(v.cfg, final_assembly_size, shader); } return assembly; diff --git
Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.
On Sat, Oct 10, 2015 at 4:15 PM, Bas Nieuwenhuizenwrote: > Hi Marek, > > The revised series is mostly done. I wanted to do more testing and to > try to make sure that the added cache flushes I am doing now (a > CACHE_FLUSH event before a fast clear and on switching framebuffers) > are the minimal needed. > >> Also, it looks like we don't need DCC decompression at all, right? It >> might be better to get rid of it and only use the 3D engine to access >> DCC-encoded surfaces. > > I still use it for flush_resource. I could make this redundant by > sharing the DCC buffer by appending the DCC buffer to the texture > resource similarly to how the CMASK is appended to the resource of a > MSAA buffer. This has the secondary benefit of not needing to > reference as many resources for command submission. IIRC, flush_resource is only used for shared (scanout) surfaces where DCC is always disabled. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V4 2/6] glsl: assign hidden uniforms their slot id earlier
Hi Timothy, One of these 3 commits breaks compilation for Talos shaders with gallium. My piglit patch "glsl-1.30/sampler-bug: ..." contains a minimal test case. I can't say which commit, because Mesa fails to build between them. It has something to do with uniforms, structures, and samplers. commit dcd9cd03837545055ce2a315e7e8840cc3254d1a Author: Timothy ArceriDate: Sun Aug 30 12:50:34 2015 +1000 glsl: store uniform slot id in var location field ... commit 9788700caf61ff8beee5fd836f5efd98a931a976 Author: Timothy Arceri Date: Wed Sep 2 11:29:11 2015 +1000 glsl: assign hidden uniforms their slot id earlier ... commit 874a0217fd8bba83b0bc2448f5156fdb82f77d7c Author: Timothy Arceri Date: Sun Aug 30 12:49:46 2015 +1000 glsl: order indices for samplers inside a struct array ... Any idea? Thanks, Marek On Tue, Sep 15, 2015 at 9:51 AM, Timothy Arceri wrote: > This is required so that the next patch can safely assign the slot id > to the var. > > The ids are now assigned in the order we want before allocating storage > so there is no need to sort the storage array and move things around. > > V2: rename variable to make code easier to follow as suggested by Jason > > Reviewed-by: Jason Ekstrand > --- > src/glsl/link_uniforms.cpp | 90 > +- > 1 file changed, 41 insertions(+), 49 deletions(-) > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] i965/vec4: nir_emit_if doesn't need to predicate based on all the channels
On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeirowrote: > --- > > I already talked about this with Jason Ekstrand and Matt Turner > privately, but just in case somebody else jump to the review: > > When using BRW_PREDICATE_NORMAL, the if will use all the channels of > the register flag. But nir_if only reads from one channel, so that > is not needed. Another hint showing that this is safe: the MOV that > put the condition on f0 is calling get_nir_src with just one > component. That will return always a source with swizzle > BRW_SWIZZLE_, so that component is the only to be used. > > This commit is not needed/solving anything per-se, but it is needed in > order to be able to implement vec4_cmod_propagation with a good > overall outcome. > > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > index 41bd80d..e05745f 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > @@ -193,7 +193,9 @@ vec4_visitor::nir_emit_if(nir_if *if_stmt) > vec4_instruction *inst = emit(MOV(dst_null_d(), condition)); > inst->conditional_mod = BRW_CONDITIONAL_NZ; > > - emit(IF(BRW_PREDICATE_NORMAL)); > + /* We can just predicate based on the X channel, as the condition only > +* reads from one channel */ */ goes on its own line. > + emit(IF(BRW_PREDICATE_ALIGN16_REPLICATE_X)); I agree with what Jason says -- seems like we should have been doing this all along. Reviewed-by: Matt Turner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/6] glsl: move half<->float convertion to util
On Sat, Oct 10, 2015 at 11:47 AM, Rob Clarkwrote: > From: Rob Clark > > Needed in NIR too, so move out of mesa/main/imports.c > > Signed-off-by: Rob Clark > --- > src/glsl/Makefile.am | 1 + > src/mesa/main/imports.c | 148 -- > src/mesa/main/imports.h | 38 -- > src/util/Makefile.sources | 2 + > src/util/convert.c| 179 > ++ > src/util/convert.h| 43 +++ > 6 files changed, 259 insertions(+), 152 deletions(-) > create mode 100644 src/util/convert.c > create mode 100644 src/util/convert.h > > diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am > index 3265391..347919b 100644 > --- a/src/glsl/Makefile.am > +++ b/src/glsl/Makefile.am > @@ -160,6 +160,7 @@ glsl_compiler_SOURCES = \ > glsl_compiler_LDADD = \ > libglsl.la \ > $(top_builddir)/src/libglsl_util.la \ > + $(top_builddir)/src/util/libmesautil.la \ > $(PTHREAD_LIBS) > > glsl_test_SOURCES = \ > diff --git a/src/mesa/main/imports.c b/src/mesa/main/imports.c > index 350e675..230ebbc 100644 > --- a/src/mesa/main/imports.c > +++ b/src/mesa/main/imports.c > @@ -307,154 +307,6 @@ _mesa_bitcount_64(uint64_t n) > } > #endif > > - > -/** > - * Convert a 4-byte float to a 2-byte half float. > - * > - * Not all float32 values can be represented exactly as a float16 value. We > - * round such intermediate float32 values to the nearest float16. When the > - * float32 lies exactly between to float16 values, we round to the one with > - * an even mantissa. > - * > - * This rounding behavior has several benefits: > - * - It has no sign bias. > - * > - * - It reproduces the behavior of real hardware: opcode F32TO16 in Intel's > - * GPU ISA. > - * > - * - By reproducing the behavior of the GPU (at least on Intel hardware), > - * compile-time evaluation of constant packHalf2x16 GLSL expressions will > - * result in the same value as if the expression were executed on the > GPU. > - */ > -GLhalfARB > -_mesa_float_to_half(float val) > -{ > - const fi_type fi = {val}; > - const int flt_m = fi.i & 0x7f; > - const int flt_e = (fi.i >> 23) & 0xff; > - const int flt_s = (fi.i >> 31) & 0x1; > - int s, e, m = 0; > - GLhalfARB result; > - > - /* sign bit */ > - s = flt_s; > - > - /* handle special cases */ > - if ((flt_e == 0) && (flt_m == 0)) { > - /* zero */ > - /* m = 0; - already set */ > - e = 0; > - } > - else if ((flt_e == 0) && (flt_m != 0)) { > - /* denorm -- denorm float maps to 0 half */ > - /* m = 0; - already set */ > - e = 0; > - } > - else if ((flt_e == 0xff) && (flt_m == 0)) { > - /* infinity */ > - /* m = 0; - already set */ > - e = 31; > - } > - else if ((flt_e == 0xff) && (flt_m != 0)) { > - /* NaN */ > - m = 1; > - e = 31; > - } > - else { > - /* regular number */ > - const int new_exp = flt_e - 127; > - if (new_exp < -14) { > - /* The float32 lies in the range (0.0, min_normal16) and is rounded > - * to a nearby float16 value. The result will be either zero, > subnormal, > - * or normal. > - */ > - e = 0; > - m = _mesa_lroundevenf((1 << 24) * fabsf(fi.f)); > - } > - else if (new_exp > 15) { > - /* map this value to infinity */ > - /* m = 0; - already set */ > - e = 31; > - } > - else { > - /* The float32 lies in the range > - * [min_normal16, max_normal16 + max_step16) > - * and is rounded to a nearby float16 value. The result will be > - * either normal or infinite. > - */ > - e = new_exp + 15; > - m = _mesa_lroundevenf(flt_m / (float) (1 << 13)); > - } > - } > - > - assert(0 <= m && m <= 1024); > - if (m == 1024) { > - /* The float32 was rounded upwards into the range of the next exponent, > - * so bump the exponent. This correctly handles the case where f32 > - * should be rounded up to float16 infinity. > - */ > - ++e; > - m = 0; > - } > - > - result = (s << 15) | (e << 10) | m; > - return result; > -} > - > - > -/** > - * Convert a 2-byte half float to a 4-byte float. > - * Based on code from: > - * http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/008786.html > - */ > -float > -_mesa_half_to_float(GLhalfARB val) > -{ > - /* XXX could also use a 64K-entry lookup table */ > - const int m = val & 0x3ff; > - const int e = (val >> 10) & 0x1f; > - const int s = (val >> 15) & 0x1; > - int flt_m, flt_e, flt_s; > - fi_type fi; > - float result; > - > - /* sign bit */ > - flt_s = s; > - > - /* handle special cases */ > - if ((e == 0) && (m == 0)) { > - /* zero */ >
Re: [Mesa-dev] [PATCH] i965/vec4: Implement b2f and b2i using negation.
Matt Turnerwrites: > Curro added this in commit 3ee2daf23d (before the vec4/NIR backend was > added) but it was missed in the new NIR backend. Add it there as well. > > instructions in affected programs: 1857 -> 1810 (-2.53%) > helped:15 > --- > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8 +--- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > index 41bd80d..fdf767d 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > @@ -1237,14 +1237,8 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) >break; > > case nir_op_b2i: > - emit(AND(dst, op[0], src_reg(1))); > - break; > - > case nir_op_b2f: > - op[0].type = BRW_REGISTER_TYPE_D; > - dst.type = BRW_REGISTER_TYPE_D; > - emit(AND(dst, op[0], src_reg(0x3f80u))); > - dst.type = BRW_REGISTER_TYPE_F; > + emit(MOV(dst, negate(op[0]))); >break; Looks good to me, Reviewed-by: Francisco Jerez > > case nir_op_f2b: > -- > 2.4.9 signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space
On 10/10/2015 09:42 PM, Ilia Mirkin wrote: On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoisetwrote: This patch looks fine except that it should be a bit more normalized. I mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for PUSH_SPACE calls, sometimes you add it sometimes not. Meh. We need to get our error checking situation straight, but this isn't the patch to do it in. Yeah, but this needs to be clarified. Did you run a full piglit test this time ? :) Nope, but I ran a full piglit before this patch. Almost took down my box. Probably won't be running it again for this patch. Ok, I'll run a full piglit this night then. See my comment below. On 10/10/2015 11:09 AM, Ilia Mirkin wrote: We still have to push everything out, might as well kick earlier and flip pushbufs when we know we'll need it. This resolves some issues with the new policy of making sure that we always leave a bit of room at the end for fences. Signed-off-by: Ilia Mirkin Cc: mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/nouveau/nv50/nv50_shader_state.c | 9 ++--- src/gallium/drivers/nouveau/nv50/nv50_transfer.c | 16 +++- src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c | 20 +--- 3 files changed, 10 insertions(+), 35 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c index fdde11f..941555f 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c @@ -65,14 +65,9 @@ nv50_constbufs_validate(struct nv50_context *nv50) PUSH_DATA (push, (b << 12) | (i << 8) | p | 1); } while (words) { - unsigned nr; - - if (!PUSH_SPACE(push, 16)) - break; - nr = PUSH_AVAIL(push); - assert(nr >= 16); - nr = MIN2(MIN2(nr - 3, words), NV04_PFIFO_MAX_PACKET_LEN); + unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN); + PUSH_SPACE(push, nr + 3); This PUSH_SPACE call doesn't seem to be needed for me because NV50_PUSH_EXPLICIT_SPACE_CHECKING is not set and the following BEGIN_XXX calls will allocate space. I want to ensure that both of the below commands are in the same batch. Not sure if it's necessary, but... don't want to find out. They were in the same batch before. And this batch stuff is what was causing the M2MF errors I was seeing earlier. BEGIN_NV04(push, NV50_3D(CB_ADDR), 1); PUSH_DATA (push, (start << 8) | b); BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nr); diff --git a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c index be51407..9a3fd1e 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c @@ -187,14 +187,7 @@ nv50_sifc_linear_u8(struct nouveau_context *nv, PUSH_DATA (push, 0); while (count) { - unsigned nr; - - if (!PUSH_SPACE(push, 16)) - break; - nr = PUSH_AVAIL(push); - assert(nr >= 16); - nr = MIN2(count, nr - 1); - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN); + unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN); BEGIN_NI04(push, NV50_2D(SIFC_DATA), nr); PUSH_DATAp(push, src, nr); @@ -395,12 +388,9 @@ nv50_cb_push(struct nouveau_context *nv, nouveau_pushbuf_validate(push); while (words) { - unsigned nr; - - nr = PUSH_AVAIL(push); - nr = MIN2(nr - 7, words); - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1); + unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN); + PUSH_SPACE(push, nr + 7); BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3); PUSH_DATAh(push, bo->offset + base); PUSH_DATA (push, bo->offset + base); diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c index aaec60a..d459dd6 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c @@ -188,14 +188,10 @@ nvc0_m2mf_push_linear(struct nouveau_context *nv, nouveau_pushbuf_validate(push); while (count) { - unsigned nr; + unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN); - if (!PUSH_SPACE(push, 16)) + if (!PUSH_SPACE(push, nr + 9)) break; - nr = PUSH_AVAIL(push); - assert(nr >= 16); - nr = MIN2(count, nr - 9); - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN); BEGIN_NVC0(push, NVC0_M2MF(OFFSET_OUT_HIGH), 2); PUSH_DATAh(push, dst->offset + offset); @@ -234,14 +230,10 @@ nve4_p2mf_push_linear(struct nouveau_context *nv, nouveau_pushbuf_validate(push); while (count) { - unsigned nr; + unsigned
[Mesa-dev] [PATCH 11/17] i965/fs: Rework wm_fs_emit to take a nir_shader and a brw_compiler
This commit removes all dependence on GL state by getting rid of the brw_context parameter and the GL data structures. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 59 src/mesa/drivers/dri/i965/brw_wm.c | 14 +++-- src/mesa/drivers/dri/i965/brw_wm.h | 13 +--- 3 files changed, 47 insertions(+), 39 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 3c83f2a..8bdc676 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -5115,40 +5115,39 @@ fs_visitor::run_cs() } const unsigned * -brw_wm_fs_emit(struct brw_context *brw, +brw_wm_fs_emit(const struct brw_compiler *compiler, void *log_data, void *mem_ctx, const struct brw_wm_prog_key *key, struct brw_wm_prog_data *prog_data, - struct gl_fragment_program *fp, - struct gl_shader_program *prog, + const nir_shader *shader, + struct gl_program *prog, int shader_time_index8, int shader_time_index16, - unsigned *final_assembly_size) + bool use_rep_send, + unsigned *final_assembly_size, + char **error_str) { - /* Now the main event: Visit the shader IR and generate our FS IR for it. -*/ - fs_visitor v(brw->intelScreen->compiler, brw, mem_ctx, key, -_data->base, >Base, fp->Base.nir, 8, shader_time_index8); + fs_visitor v(compiler, log_data, mem_ctx, key, +_data->base, prog, shader, 8, +shader_time_index8); if (!v.run_fs(false /* do_rep_send */)) { - if (prog) { - prog->LinkStatus = false; - ralloc_strcat(>InfoLog, v.fail_msg); - } - - _mesa_problem(NULL, "Failed to compile fragment shader: %s\n", -v.fail_msg); + if (error_str) + *error_str = ralloc_strdup(mem_ctx, v.fail_msg); return NULL; } cfg_t *simd16_cfg = NULL; - fs_visitor v2(brw->intelScreen->compiler, brw, mem_ctx, key, - _data->base, >Base, fp->Base.nir, 16, shader_time_index16); - if (likely(!(INTEL_DEBUG & DEBUG_NO16) || brw->use_rep_send)) { + fs_visitor v2(compiler, log_data, mem_ctx, key, + _data->base, prog, shader, 16, + shader_time_index16); + if (likely(!(INTEL_DEBUG & DEBUG_NO16) || use_rep_send)) { if (!v.simd16_unsupported) { /* Try a SIMD16 compile */ v2.import_uniforms(); - if (!v2.run_fs(brw->use_rep_send)) { -perf_debug("SIMD16 shader failed to compile: %s", v2.fail_msg); + if (!v2.run_fs(use_rep_send)) { +compiler->shader_perf_log(log_data, + "SIMD16 shader failed to compile: %s", + v2.fail_msg); } else { simd16_cfg = v2.cfg; } @@ -5156,8 +5155,8 @@ brw_wm_fs_emit(struct brw_context *brw, } cfg_t *simd8_cfg; - int no_simd8 = (INTEL_DEBUG & DEBUG_NO8) || brw->no_simd8; - if ((no_simd8 || brw->gen < 5) && simd16_cfg) { + int no_simd8 = (INTEL_DEBUG & DEBUG_NO8) || use_rep_send; + if ((no_simd8 || compiler->devinfo->gen < 5) && simd16_cfg) { simd8_cfg = NULL; prog_data->no_8 = true; } else { @@ -5165,20 +5164,14 @@ brw_wm_fs_emit(struct brw_context *brw, prog_data->no_8 = false; } - fs_generator g(brw->intelScreen->compiler, brw, - mem_ctx, (void *) key, _data->base, + fs_generator g(compiler, log_data, mem_ctx, (void *) key, _data->base, v.promoted_constants, v.runtime_check_aads_emit, "FS"); if (unlikely(INTEL_DEBUG & DEBUG_WM)) { - char *name; - if (prog) - name = ralloc_asprintf(mem_ctx, "%s fragment shader %d", -prog->Label ? prog->Label : "unnamed", -prog->Name); - else - name = ralloc_asprintf(mem_ctx, "fragment program %d", fp->Base.Id); - - g.enable_debug(name); + g.enable_debug(ralloc_asprintf(mem_ctx, "%s fragment shader %s", + shader->info.label ? shader->info.label : + "unnamed", + shader->info.name)); } if (simd8_cfg) diff --git a/src/mesa/drivers/dri/i965/brw_wm.c b/src/mesa/drivers/dri/i965/brw_wm.c index 4d5e7f6..ab9461a 100644 --- a/src/mesa/drivers/dri/i965/brw_wm.c +++ b/src/mesa/drivers/dri/i965/brw_wm.c @@ -230,9 +230,19 @@ brw_codegen_wm_prog(struct brw_context *brw, st_index16 = brw_get_shader_time_index(brw, prog, >program.Base, ST_FS16); } - program = brw_wm_fs_emit(brw, mem_ctx, key, _data, ->program, prog, st_index8, st_index16, _size); + char *error_str = NULL; + program =
Re: [Mesa-dev] [PATCH 0/6] Remove NIR dependency on GLSL
On Sat, Oct 10, 2015 at 2:47 PM, Rob Clarkwrote: > From: Rob Clark > > This patchset removes the NIR dependency on GLSL (and includes resend > of shader_enums cleanups w/ addition of STATIC_ASSERT()'s) > > Split up glsl_types so the builtin-types go w/ glsl_types but the parts > that add them to glsl_symbol_table stay with glsl. This way we can move > glsl_types into NIR without dragging along glsl_symbol_table and all of > it's dependencies. > > Also move the half/float conversion into util so it can be used from NIR > without bringing an external dependency. > > With this we can move glsl_types into NIR and drop the dependency on > GLSL, and mostly remove the libglsl_util hack. (The standalone glsl- > compiler util still needs libglsl_util, so we can't remove it completely > yet, but we can remove the dependency on libglsl_util from non-mesa > state trackers. And a hypothetical vulkan implementation using NIR > should also not need to suck in libglsl_util.) > > Probably there is some room to rename things to complete the cleanup, > but I figured it was good to split things up into moving things first, > and do flag-day renames second (if desired). > > Rob Clark (6): > glsl: couple shader_enums cleanups > glsl: move builtin types to glsl_types.cpp > glsl: move half<->float convertion to util > nir: use util/convert.h > nir: remove dependency on glsl btw, this one seems to have bounced due to size (since moving files), but you can find it here: https://github.com/freedreno/mesa/commits/wip-nir-no-glsl BR, -R > glsl: (mostly) remove libglsl_util > > src/gallium/drivers/freedreno/Makefile.am |3 +- > src/gallium/targets/d3dadapter9/Makefile.am|1 - > src/gallium/targets/pipe-loader/Makefile.am|1 - > src/gallium/targets/xa/Makefile.am |1 - > src/glsl/Makefile.am | 10 +- > src/glsl/Makefile.sources |4 +- > src/glsl/builtin_type_macros.h | 172 -- > src/glsl/builtin_types.cpp |4 +- > src/glsl/glsl_types.cpp| 1715 --- > src/glsl/glsl_types.h | 867 -- > src/glsl/nir/builtin_type_macros.h | 172 ++ > src/glsl/nir/glsl_types.cpp| 1729 > > src/glsl/nir/glsl_types.h | 867 ++ > src/glsl/nir/nir_constant_expressions.py |5 +- > src/glsl/nir/nir_types.h |2 +- > src/glsl/nir/shader_enums.c|8 + > src/glsl/nir/shader_enums.h|7 + > .../drivers/dri/i965/brw_cubemap_normalize.cpp |2 +- > src/mesa/drivers/dri/i965/brw_fs.cpp |2 +- > src/mesa/drivers/dri/i965/brw_fs.h |2 +- > .../dri/i965/brw_fs_channel_expressions.cpp|2 +- > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp |2 +- > .../drivers/dri/i965/brw_fs_vector_splitting.cpp |2 +- > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |2 +- > .../dri/i965/brw_lower_unnormalized_offset.cpp |2 +- > .../drivers/dri/i965/brw_schedule_instructions.cpp |2 +- > src/mesa/main/ff_fragment_shader.cpp |2 +- > src/mesa/main/imports.c| 148 -- > src/mesa/main/imports.h| 38 +- > src/mesa/main/mtypes.h |5 - > src/mesa/main/uniforms.h |2 +- > src/mesa/program/ir_to_mesa.cpp|2 +- > src/mesa/program/sampler.cpp |2 +- > src/util/Makefile.sources |2 + > src/util/convert.c | 179 ++ > src/util/convert.h | 43 + > 36 files changed, 3063 insertions(+), 2946 deletions(-) > delete mode 100644 src/glsl/builtin_type_macros.h > delete mode 100644 src/glsl/glsl_types.cpp > delete mode 100644 src/glsl/glsl_types.h > create mode 100644 src/glsl/nir/builtin_type_macros.h > create mode 100644 src/glsl/nir/glsl_types.cpp > create mode 100644 src/glsl/nir/glsl_types.h > create mode 100644 src/util/convert.c > create mode 100644 src/util/convert.h > > -- > 2.4.3 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nouveau: avoid emitting new fences unnecessarily
On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoisetwrote: > Does this fix those texelFetch piglit tests ? Or is it the second patch ? This patch "fixes" the initial texelFetch piglit failures. However it creates some fresh texelFetch piglit failures -- that test is interesting because it does a lot of draws with minimal state changes between them. Those ones are fixed by the second patch. But really these are all different problems, which interact with each other in frustrating ways. > > Anyway, this patch is : > > Reviewed-by: Samuel Pitoiset > > > On 10/10/2015 08:12 AM, Ilia Mirkin wrote: >> >> Right now we emit on every kick, but this is only necessary if something >> will ever be able to observe that the fence completed. If there are no >> refs, leave the fence alone and emit it another day. >> >> This also happens to work around an issue for the kick handler -- a kick >> can be a result of e.g. nouveau_bo_wait or explicit kick, or it can be >> due to lack of space in the pushbuf. We want the emit to happen in the >> current batch, so we want there to always be enough space. However an >> explicit kick could take the reserved space for the implicitly-triggered >> kick's fence emission if it happened right after. With the new mechanism, >> hopefully there's no way to cause two fences to be emitted into the same >> reserved space. >> >> Signed-off-by: Ilia Mirkin >> Cc: mesa-sta...@lists.freedesktop.org >> Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence) >> --- >> src/gallium/drivers/nouveau/nouveau_fence.c | 12 +--- >> 1 file changed, 9 insertions(+), 3 deletions(-) >> >> diff --git a/src/gallium/drivers/nouveau/nouveau_fence.c >> b/src/gallium/drivers/nouveau/nouveau_fence.c >> index ee4e08d..18b1592 100644 >> --- a/src/gallium/drivers/nouveau/nouveau_fence.c >> +++ b/src/gallium/drivers/nouveau/nouveau_fence.c >> @@ -190,8 +190,10 @@ nouveau_fence_wait(struct nouveau_fence *fence) >> /* wtf, someone is waiting on a fence in flush_notify handler? */ >> assert(fence->state != NOUVEAU_FENCE_STATE_EMITTING); >> - if (fence->state < NOUVEAU_FENCE_STATE_EMITTED) >> + if (fence->state < NOUVEAU_FENCE_STATE_EMITTED) { >> + PUSH_SPACE(screen->pushbuf, 8); >> nouveau_fence_emit(fence); >> + } >>if (fence->state < NOUVEAU_FENCE_STATE_FLUSHED) >> if (nouveau_pushbuf_kick(screen->pushbuf, >> screen->pushbuf->channel)) >> @@ -224,8 +226,12 @@ nouveau_fence_wait(struct nouveau_fence *fence) >> void >> nouveau_fence_next(struct nouveau_screen *screen) >> { >> - if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING) >> - nouveau_fence_emit(screen->fence.current); >> + if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING) { >> + if (screen->fence.current->ref > 1) >> + nouveau_fence_emit(screen->fence.current); >> + else >> + return; >> + } >>nouveau_fence_ref(NULL, >fence.current); >> > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler
This commit removes all dependence on GL state by getting rid of the brw_context parameter and the GL data structures. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 67 +- src/mesa/drivers/dri/i965/brw_vs.c | 14 ++- src/mesa/drivers/dri/i965/brw_vs.h | 11 -- 3 files changed, 44 insertions(+), 48 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 4b03967..d6549de 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1937,51 +1937,41 @@ extern "C" { * Returns the final assembly and the program's size. */ const unsigned * -brw_vs_emit(struct brw_context *brw, +brw_vs_emit(const struct brw_compiler *compiler, void *log_data, void *mem_ctx, const struct brw_vs_prog_key *key, struct brw_vs_prog_data *prog_data, -struct gl_vertex_program *vp, -struct gl_shader_program *prog, +const nir_shader *shader, +gl_clip_plane *clip_planes, int shader_time_index, -unsigned *final_assembly_size) +unsigned *final_assembly_size, +char **error_str) { const unsigned *assembly = NULL; - if (brw->intelScreen->compiler->scalar_vs) { + if (compiler->scalar_vs) { prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8; - fs_visitor v(brw->intelScreen->compiler, brw, - mem_ctx, key, _data->base.base, + fs_visitor v(compiler, log_data, mem_ctx, key, _data->base.base, NULL, /* prog; Only used for TEXTURE_RECTANGLE on gen < 8 */ - vp->Base.nir, 8, shader_time_index); - if (!v.run_vs(brw_select_clip_planes(>ctx))) { - if (prog) { -prog->LinkStatus = false; -ralloc_strcat(>InfoLog, v.fail_msg); - } - - _mesa_problem(NULL, "Failed to compile vertex shader: %s\n", - v.fail_msg); + shader, 8, shader_time_index); + if (!v.run_vs(clip_planes)) { + if (error_str) +*error_str = ralloc_strdup(mem_ctx, v.fail_msg); return NULL; } - fs_generator g(brw->intelScreen->compiler, brw, - mem_ctx, (void *) key, _data->base.base, - v.promoted_constants, + fs_generator g(compiler, log_data, mem_ctx, (void *) key, + _data->base.base, v.promoted_constants, v.runtime_check_aads_emit, "VS"); if (INTEL_DEBUG & DEBUG_VS) { - char *name; - if (prog) { -name = ralloc_asprintf(mem_ctx, "%s vertex shader %d", - prog->Label ? prog->Label : "unnamed", - prog->Name); - } else { -name = ralloc_asprintf(mem_ctx, "vertex program %d", - vp->Base.Id); - } - g.enable_debug(name); + const char *debug_name = +ralloc_asprintf(mem_ctx, "%s vertex shader %s", +shader->info.label ? shader->info.label : "unnamed", +shader->info.name); + + g.enable_debug(debug_name); } g.generate_code(v.cfg, 8); assembly = g.get_assembly(final_assembly_size); @@ -1990,25 +1980,18 @@ brw_vs_emit(struct brw_context *brw, if (!assembly) { prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT; - vec4_vs_visitor v(brw->intelScreen->compiler, brw, key, prog_data, -vp->Base.nir, brw_select_clip_planes(>ctx), -mem_ctx, shader_time_index); + vec4_vs_visitor v(compiler, log_data, key, prog_data, +shader, clip_planes, mem_ctx, shader_time_index); if (!v.run()) { - if (prog) { -prog->LinkStatus = false; -ralloc_strcat(>InfoLog, v.fail_msg); - } - - _mesa_problem(NULL, "Failed to compile vertex shader: %s\n", - v.fail_msg); + if (error_str) +*error_str = ralloc_strdup(mem_ctx, v.fail_msg); return NULL; } - vec4_generator g(brw->intelScreen->compiler, brw, - _data->base, + vec4_generator g(compiler, log_data, _data->base, mem_ctx, INTEL_DEBUG & DEBUG_VS, "vertex", "VS"); - assembly = g.generate_assembly(v.cfg, final_assembly_size, vp->Base.nir); + assembly = g.generate_assembly(v.cfg, final_assembly_size, shader); } return assembly; diff --git a/src/mesa/drivers/dri/i965/brw_vs.c b/src/mesa/drivers/dri/i965/brw_vs.c index ecaeefa..f54c9a3 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.c +++ b/src/mesa/drivers/dri/i965/brw_vs.c @@ -180,9 +180,19 @@ brw_codegen_vs_prog(struct brw_context *brw, /* Emit GEN4 code. */ - program =
[Mesa-dev] [PATCH 1/6] glsl: couple shader_enums cleanups
From: Rob ClarkAdd missing enum to gl_system_value_name() and move VARYING_SLOT_MAX / FRAG_RESULT_MAX / etc into shader_enums.h as suggested by Emil. v2: add STATIC_ASSERT()'s Reported-by: Emil Velikov Signed-off-by: Rob Clark --- src/glsl/nir/shader_enums.c | 8 src/glsl/nir/shader_enums.h | 7 +++ src/mesa/main/mtypes.h | 5 - 3 files changed, 15 insertions(+), 5 deletions(-) diff --git a/src/glsl/nir/shader_enums.c b/src/glsl/nir/shader_enums.c index 3722475..66a25e7 100644 --- a/src/glsl/nir/shader_enums.c +++ b/src/glsl/nir/shader_enums.c @@ -28,6 +28,7 @@ #include "shader_enums.h" #include "util/macros.h" +#include "mesa/main/config.h" #define ENUM(x) [x] = #x #define NAME(val) val) < ARRAY_SIZE(names)) && names[(val)]) ? names[(val)] : "UNKNOWN") @@ -42,6 +43,7 @@ const char * gl_shader_stage_name(gl_shader_stage stage) ENUM(MESA_SHADER_FRAGMENT), ENUM(MESA_SHADER_COMPUTE), }; + STATIC_ASSERT(ARRAY_SIZE(names) == MESA_SHADER_STAGES); return NAME(stage); } @@ -82,6 +84,7 @@ const char * gl_vert_attrib_name(gl_vert_attrib attrib) ENUM(VERT_ATTRIB_GENERIC14), ENUM(VERT_ATTRIB_GENERIC15), }; + STATIC_ASSERT(ARRAY_SIZE(names) == VERT_ATTRIB_MAX); return NAME(attrib); } @@ -147,6 +150,7 @@ const char * gl_varying_slot_name(gl_varying_slot slot) ENUM(VARYING_SLOT_VAR30), ENUM(VARYING_SLOT_VAR31), }; + STATIC_ASSERT(ARRAY_SIZE(names) == VARYING_SLOT_MAX); return NAME(slot); } @@ -169,8 +173,10 @@ const char * gl_system_value_name(gl_system_value sysval) ENUM(SYSTEM_VALUE_TESS_LEVEL_INNER), ENUM(SYSTEM_VALUE_LOCAL_INVOCATION_ID), ENUM(SYSTEM_VALUE_WORK_GROUP_ID), + ENUM(SYSTEM_VALUE_NUM_WORK_GROUPS), ENUM(SYSTEM_VALUE_VERTEX_CNT), }; + STATIC_ASSERT(ARRAY_SIZE(names) == SYSTEM_VALUE_MAX); return NAME(sysval); } @@ -182,6 +188,7 @@ const char * glsl_interp_qualifier_name(enum glsl_interp_qualifier qual) ENUM(INTERP_QUALIFIER_FLAT), ENUM(INTERP_QUALIFIER_NOPERSPECTIVE), }; + STATIC_ASSERT(ARRAY_SIZE(names) == INTERP_QUALIFIER_COUNT); return NAME(qual); } @@ -201,5 +208,6 @@ const char * gl_frag_result_name(gl_frag_result result) ENUM(FRAG_RESULT_DATA6), ENUM(FRAG_RESULT_DATA7), }; + STATIC_ASSERT(ARRAY_SIZE(names) == FRAG_RESULT_MAX); return NAME(result); } diff --git a/src/glsl/nir/shader_enums.h b/src/glsl/nir/shader_enums.h index 2a5d2c5..77638ba 100644 --- a/src/glsl/nir/shader_enums.h +++ b/src/glsl/nir/shader_enums.h @@ -233,6 +233,11 @@ typedef enum VARYING_SLOT_VAR31, } gl_varying_slot; + +#define VARYING_SLOT_MAX (VARYING_SLOT_VAR0 + MAX_VARYING) +#define VARYING_SLOT_PATCH0(VARYING_SLOT_MAX) +#define VARYING_SLOT_TESS_MAX (VARYING_SLOT_PATCH0 + MAX_VARYING) + const char * gl_varying_slot_name(gl_varying_slot slot); /** @@ -473,4 +478,6 @@ typedef enum const char * gl_frag_result_name(gl_frag_result result); +#define FRAG_RESULT_MAX(FRAG_RESULT_DATA0 + MAX_DRAW_BUFFERS) + #endif /* SHADER_ENUMS_H */ diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 0a54b20..ba94402 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -94,11 +94,6 @@ struct vbo_context; #define PRIM_OUTSIDE_BEGIN_END (PRIM_MAX + 1) #define PRIM_UNKNOWN (PRIM_MAX + 2) -#define VARYING_SLOT_MAX (VARYING_SLOT_VAR0 + MAX_VARYING) -#define VARYING_SLOT_PATCH0(VARYING_SLOT_MAX) -#define VARYING_SLOT_TESS_MAX (VARYING_SLOT_PATCH0 + MAX_VARYING) -#define FRAG_RESULT_MAX(FRAG_RESULT_DATA0 + MAX_DRAW_BUFFERS) - /** * Determine if the given gl_varying_slot appears in the fragment shader. */ -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] glsl: move half<->float convertion to util
From: Rob ClarkNeeded in NIR too, so move out of mesa/main/imports.c Signed-off-by: Rob Clark --- src/glsl/Makefile.am | 1 + src/mesa/main/imports.c | 148 -- src/mesa/main/imports.h | 38 -- src/util/Makefile.sources | 2 + src/util/convert.c| 179 ++ src/util/convert.h| 43 +++ 6 files changed, 259 insertions(+), 152 deletions(-) create mode 100644 src/util/convert.c create mode 100644 src/util/convert.h diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am index 3265391..347919b 100644 --- a/src/glsl/Makefile.am +++ b/src/glsl/Makefile.am @@ -160,6 +160,7 @@ glsl_compiler_SOURCES = \ glsl_compiler_LDADD = \ libglsl.la \ $(top_builddir)/src/libglsl_util.la \ + $(top_builddir)/src/util/libmesautil.la \ $(PTHREAD_LIBS) glsl_test_SOURCES = \ diff --git a/src/mesa/main/imports.c b/src/mesa/main/imports.c index 350e675..230ebbc 100644 --- a/src/mesa/main/imports.c +++ b/src/mesa/main/imports.c @@ -307,154 +307,6 @@ _mesa_bitcount_64(uint64_t n) } #endif - -/** - * Convert a 4-byte float to a 2-byte half float. - * - * Not all float32 values can be represented exactly as a float16 value. We - * round such intermediate float32 values to the nearest float16. When the - * float32 lies exactly between to float16 values, we round to the one with - * an even mantissa. - * - * This rounding behavior has several benefits: - * - It has no sign bias. - * - * - It reproduces the behavior of real hardware: opcode F32TO16 in Intel's - * GPU ISA. - * - * - By reproducing the behavior of the GPU (at least on Intel hardware), - * compile-time evaluation of constant packHalf2x16 GLSL expressions will - * result in the same value as if the expression were executed on the GPU. - */ -GLhalfARB -_mesa_float_to_half(float val) -{ - const fi_type fi = {val}; - const int flt_m = fi.i & 0x7f; - const int flt_e = (fi.i >> 23) & 0xff; - const int flt_s = (fi.i >> 31) & 0x1; - int s, e, m = 0; - GLhalfARB result; - - /* sign bit */ - s = flt_s; - - /* handle special cases */ - if ((flt_e == 0) && (flt_m == 0)) { - /* zero */ - /* m = 0; - already set */ - e = 0; - } - else if ((flt_e == 0) && (flt_m != 0)) { - /* denorm -- denorm float maps to 0 half */ - /* m = 0; - already set */ - e = 0; - } - else if ((flt_e == 0xff) && (flt_m == 0)) { - /* infinity */ - /* m = 0; - already set */ - e = 31; - } - else if ((flt_e == 0xff) && (flt_m != 0)) { - /* NaN */ - m = 1; - e = 31; - } - else { - /* regular number */ - const int new_exp = flt_e - 127; - if (new_exp < -14) { - /* The float32 lies in the range (0.0, min_normal16) and is rounded - * to a nearby float16 value. The result will be either zero, subnormal, - * or normal. - */ - e = 0; - m = _mesa_lroundevenf((1 << 24) * fabsf(fi.f)); - } - else if (new_exp > 15) { - /* map this value to infinity */ - /* m = 0; - already set */ - e = 31; - } - else { - /* The float32 lies in the range - * [min_normal16, max_normal16 + max_step16) - * and is rounded to a nearby float16 value. The result will be - * either normal or infinite. - */ - e = new_exp + 15; - m = _mesa_lroundevenf(flt_m / (float) (1 << 13)); - } - } - - assert(0 <= m && m <= 1024); - if (m == 1024) { - /* The float32 was rounded upwards into the range of the next exponent, - * so bump the exponent. This correctly handles the case where f32 - * should be rounded up to float16 infinity. - */ - ++e; - m = 0; - } - - result = (s << 15) | (e << 10) | m; - return result; -} - - -/** - * Convert a 2-byte half float to a 4-byte float. - * Based on code from: - * http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/008786.html - */ -float -_mesa_half_to_float(GLhalfARB val) -{ - /* XXX could also use a 64K-entry lookup table */ - const int m = val & 0x3ff; - const int e = (val >> 10) & 0x1f; - const int s = (val >> 15) & 0x1; - int flt_m, flt_e, flt_s; - fi_type fi; - float result; - - /* sign bit */ - flt_s = s; - - /* handle special cases */ - if ((e == 0) && (m == 0)) { - /* zero */ - flt_m = 0; - flt_e = 0; - } - else if ((e == 0) && (m != 0)) { - /* denorm -- denorm half will fit in non-denorm single */ - const float half_denorm = 1.0f / 16384.0f; /* 2^-14 */ - float mantissa = ((float) (m)) / 1024.0f; - float sign = s ? -1.0f : 1.0f; - return sign * mantissa * half_denorm; - } - else if ((e == 31) && (m ==
[Mesa-dev] [PATCH 2/6] glsl: move builtin types to glsl_types.cpp
From: Rob ClarkFirst step at untangling NIR's dependency on glsl_types without bringing in the dependency on glsl_symbol_table. The builtin types are now in glsl_types (which will end up in NIR), but adding them to the symbol- table stays in builtin_types.cpp (which will not be part of NIR). Signed-off-by: Rob Clark --- src/glsl/builtin_types.cpp | 4 +--- src/glsl/glsl_types.cpp| 14 ++ 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/src/glsl/builtin_types.cpp b/src/glsl/builtin_types.cpp index 0aedbb3..bbdcd19 100644 --- a/src/glsl/builtin_types.cpp +++ b/src/glsl/builtin_types.cpp @@ -43,9 +43,7 @@ * convenience pointers (glsl_type::foo_type). * @{ */ -#define DECL_TYPE(NAME, ...)\ - const glsl_type glsl_type::_##NAME##_type = glsl_type(__VA_ARGS__, #NAME); \ - const glsl_type *const glsl_type::NAME##_type = _type::_##NAME##_type; +#define DECL_TYPE(NAME, ...) #define STRUCT_TYPE(NAME) \ const glsl_type glsl_type::_struct_##NAME##_type = \ diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp index b9cb97c..b0bb2ff 100644 --- a/src/glsl/glsl_types.cpp +++ b/src/glsl/glsl_types.cpp @@ -1713,3 +1713,17 @@ glsl_type::coordinate_components() const return size; } + +/** + * Declarations of type flyweights (glsl_type::_foo_type) and + * convenience pointers (glsl_type::foo_type). + * @{ + */ +#define DECL_TYPE(NAME, ...)\ + const glsl_type glsl_type::_##NAME##_type = glsl_type(__VA_ARGS__, #NAME); \ + const glsl_type *const glsl_type::NAME##_type = _type::_##NAME##_type; + +#define STRUCT_TYPE(NAME) + +#include "builtin_type_macros.h" +/** @} */ -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/6] Remove NIR dependency on GLSL
From: Rob ClarkThis patchset removes the NIR dependency on GLSL (and includes resend of shader_enums cleanups w/ addition of STATIC_ASSERT()'s) Split up glsl_types so the builtin-types go w/ glsl_types but the parts that add them to glsl_symbol_table stay with glsl. This way we can move glsl_types into NIR without dragging along glsl_symbol_table and all of it's dependencies. Also move the half/float conversion into util so it can be used from NIR without bringing an external dependency. With this we can move glsl_types into NIR and drop the dependency on GLSL, and mostly remove the libglsl_util hack. (The standalone glsl- compiler util still needs libglsl_util, so we can't remove it completely yet, but we can remove the dependency on libglsl_util from non-mesa state trackers. And a hypothetical vulkan implementation using NIR should also not need to suck in libglsl_util.) Probably there is some room to rename things to complete the cleanup, but I figured it was good to split things up into moving things first, and do flag-day renames second (if desired). Rob Clark (6): glsl: couple shader_enums cleanups glsl: move builtin types to glsl_types.cpp glsl: move half<->float convertion to util nir: use util/convert.h nir: remove dependency on glsl glsl: (mostly) remove libglsl_util src/gallium/drivers/freedreno/Makefile.am |3 +- src/gallium/targets/d3dadapter9/Makefile.am|1 - src/gallium/targets/pipe-loader/Makefile.am|1 - src/gallium/targets/xa/Makefile.am |1 - src/glsl/Makefile.am | 10 +- src/glsl/Makefile.sources |4 +- src/glsl/builtin_type_macros.h | 172 -- src/glsl/builtin_types.cpp |4 +- src/glsl/glsl_types.cpp| 1715 --- src/glsl/glsl_types.h | 867 -- src/glsl/nir/builtin_type_macros.h | 172 ++ src/glsl/nir/glsl_types.cpp| 1729 src/glsl/nir/glsl_types.h | 867 ++ src/glsl/nir/nir_constant_expressions.py |5 +- src/glsl/nir/nir_types.h |2 +- src/glsl/nir/shader_enums.c|8 + src/glsl/nir/shader_enums.h|7 + .../drivers/dri/i965/brw_cubemap_normalize.cpp |2 +- src/mesa/drivers/dri/i965/brw_fs.cpp |2 +- src/mesa/drivers/dri/i965/brw_fs.h |2 +- .../dri/i965/brw_fs_channel_expressions.cpp|2 +- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp |2 +- .../drivers/dri/i965/brw_fs_vector_splitting.cpp |2 +- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |2 +- .../dri/i965/brw_lower_unnormalized_offset.cpp |2 +- .../drivers/dri/i965/brw_schedule_instructions.cpp |2 +- src/mesa/main/ff_fragment_shader.cpp |2 +- src/mesa/main/imports.c| 148 -- src/mesa/main/imports.h| 38 +- src/mesa/main/mtypes.h |5 - src/mesa/main/uniforms.h |2 +- src/mesa/program/ir_to_mesa.cpp|2 +- src/mesa/program/sampler.cpp |2 +- src/util/Makefile.sources |2 + src/util/convert.c | 179 ++ src/util/convert.h | 43 + 36 files changed, 3063 insertions(+), 2946 deletions(-) delete mode 100644 src/glsl/builtin_type_macros.h delete mode 100644 src/glsl/glsl_types.cpp delete mode 100644 src/glsl/glsl_types.h create mode 100644 src/glsl/nir/builtin_type_macros.h create mode 100644 src/glsl/nir/glsl_types.cpp create mode 100644 src/glsl/nir/glsl_types.h create mode 100644 src/util/convert.c create mode 100644 src/util/convert.h -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] glsl: (mostly) remove libglsl_util
From: Rob ClarkNow that NIR does not depend on glsl, we can (mostly[*]) get rid of the libglsl_util hack. [*] glsl_compiler is the one remaining user of libglsl_util Signed-off-by: Rob Clark --- src/gallium/drivers/freedreno/Makefile.am | 3 +-- src/gallium/targets/d3dadapter9/Makefile.am | 1 - src/gallium/targets/pipe-loader/Makefile.am | 1 - src/gallium/targets/xa/Makefile.am | 1 - src/glsl/Makefile.am| 6 -- 5 files changed, 1 insertion(+), 11 deletions(-) diff --git a/src/gallium/drivers/freedreno/Makefile.am b/src/gallium/drivers/freedreno/Makefile.am index dff95ba..3de8e0f 100644 --- a/src/gallium/drivers/freedreno/Makefile.am +++ b/src/gallium/drivers/freedreno/Makefile.am @@ -19,7 +19,7 @@ libfreedreno_la_SOURCES = \ noinst_PROGRAMS = ir3_compiler -# XXX: Required due to the C++ sources in libnir/libglsl_util +# XXX: Required due to the C++ sources in libnir nodist_EXTRA_ir3_compiler_SOURCES = dummy.cpp ir3_compiler_SOURCES = \ ir3/ir3_cmdline.c @@ -28,7 +28,6 @@ ir3_compiler_LDADD = \ libfreedreno.la \ $(top_builddir)/src/gallium/auxiliary/libgallium.la \ $(top_builddir)/src/glsl/libnir.la \ - $(top_builddir)/src/libglsl_util.la \ $(top_builddir)/src/util/libmesautil.la \ $(GALLIUM_COMMON_LIB_DEPS) \ $(FREEDRENO_LIBS) diff --git a/src/gallium/targets/d3dadapter9/Makefile.am b/src/gallium/targets/d3dadapter9/Makefile.am index e26ca33..b522147 100644 --- a/src/gallium/targets/d3dadapter9/Makefile.am +++ b/src/gallium/targets/d3dadapter9/Makefile.am @@ -76,7 +76,6 @@ d3dadapter9_la_LIBADD = \ $(top_builddir)/src/gallium/auxiliary/libgalliumvl_stub.la \ $(top_builddir)/src/gallium/auxiliary/libgallium.la \ $(top_builddir)/src/glsl/libnir.la \ - $(top_builddir)/src/libglsl_util.la \ $(top_builddir)/src/gallium/state_trackers/nine/libninetracker.la \ $(top_builddir)/src/util/libmesautil.la \ $(top_builddir)/src/gallium/winsys/sw/wrapper/libwsw.la \ diff --git a/src/gallium/targets/pipe-loader/Makefile.am b/src/gallium/targets/pipe-loader/Makefile.am index 4d9f7be..4f25b4f 100644 --- a/src/gallium/targets/pipe-loader/Makefile.am +++ b/src/gallium/targets/pipe-loader/Makefile.am @@ -53,7 +53,6 @@ endif PIPE_LIBS += \ $(top_builddir)/src/gallium/auxiliary/libgallium.la \ $(top_builddir)/src/glsl/libnir.la \ - $(top_builddir)/src/libglsl_util.la \ $(top_builddir)/src/util/libmesautil.la \ $(top_builddir)/src/gallium/drivers/rbug/librbug.la \ $(top_builddir)/src/gallium/drivers/trace/libtrace.la \ diff --git a/src/gallium/targets/xa/Makefile.am b/src/gallium/targets/xa/Makefile.am index 92173de..02c42c6 100644 --- a/src/gallium/targets/xa/Makefile.am +++ b/src/gallium/targets/xa/Makefile.am @@ -38,7 +38,6 @@ libxatracker_la_LIBADD = \ $(top_builddir)/src/gallium/auxiliary/libgalliumvl_stub.la \ $(top_builddir)/src/gallium/auxiliary/libgallium.la \ $(top_builddir)/src/glsl/libnir.la \ - $(top_builddir)/src/libglsl_util.la \ $(top_builddir)/src/util/libmesautil.la \ $(LIBDRM_LIBS) \ $(GALLIUM_COMMON_LIB_DEPS) diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am index 437c6a5..ebea816 100644 --- a/src/glsl/Makefile.am +++ b/src/glsl/Makefile.am @@ -96,7 +96,6 @@ tests_general_ir_test_CFLAGS = \ tests_general_ir_test_LDADD = \ $(top_builddir)/src/gtest/libgtest.la \ $(top_builddir)/src/glsl/libglsl.la \ - $(top_builddir)/src/libglsl_util.la \ $(PTHREAD_LIBS) tests_uniform_initializer_test_SOURCES = \ @@ -109,7 +108,6 @@ tests_uniform_initializer_test_CFLAGS = \ tests_uniform_initializer_test_LDADD = \ $(top_builddir)/src/gtest/libgtest.la \ $(top_builddir)/src/glsl/libglsl.la \ - $(top_builddir)/src/libglsl_util.la \ $(PTHREAD_LIBS) tests_sampler_types_test_SOURCES = \ @@ -119,7 +117,6 @@ tests_sampler_types_test_CFLAGS = \ tests_sampler_types_test_LDADD = \ $(top_builddir)/src/gtest/libgtest.la \ $(top_builddir)/src/glsl/libglsl.la \ - $(top_builddir)/src/libglsl_util.la \ $(PTHREAD_LIBS) libglcpp_la_LIBADD = \ @@ -134,7 +131,6 @@ glcpp_glcpp_SOURCES = \ glcpp/glcpp.c glcpp_glcpp_LDADD =\ libglcpp.la \ - $(top_builddir)/src/libglsl_util.la \ -lm libglsl_la_LIBADD = libglcpp.la @@ -168,7 +164,6 @@ glsl_test_SOURCES = \
[Mesa-dev] [PATCH 4/6] nir: use util/convert.h
From: Rob ClarkSigned-off-by: Rob Clark --- src/glsl/nir/nir_constant_expressions.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/glsl/nir/nir_constant_expressions.py b/src/glsl/nir/nir_constant_expressions.py index 8fd9b10..ba28c0e 100644 --- a/src/glsl/nir/nir_constant_expressions.py +++ b/src/glsl/nir/nir_constant_expressions.py @@ -28,6 +28,7 @@ template = """\ #include #include "main/core.h" +#include "util/convert.h" #include "util/rounding.h" /* for _mesa_roundeven */ #include "nir_constant_expressions.h" @@ -199,7 +200,7 @@ unpack_unorm_1x16(uint16_t u) static uint16_t pack_half_1x16(float x) { - return _mesa_float_to_half(x); + return float_to_half(x); } /** @@ -208,7 +209,7 @@ pack_half_1x16(float x) static float unpack_half_1x16(uint16_t u) { - return _mesa_half_to_float(u); + return half_to_float(u); } /* Some typed vector structures to make things like src0.y work */ -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/10] radeonsi: cleanup copy-pasted scratch buffer updates
From: Marek Olšák--- src/gallium/drivers/radeonsi/si_state_shaders.c | 39 + 1 file changed, 13 insertions(+), 26 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index c1d61d5..9395c31 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -1243,7 +1243,6 @@ static bool si_update_spi_tmpring_size(struct si_context *sctx) int r; if (scratch_needed_size > 0) { - if (scratch_needed_size > current_scratch_buffer_size) { /* Create a bigger scratch buffer */ pipe_resource_reference( @@ -1282,38 +1281,26 @@ static bool si_update_spi_tmpring_size(struct si_context *sctx) si_pm4_bind_state(sctx, hs, sctx->tcs_shader->current->pm4); /* VS can be bound as LS, ES, or VS. */ - if (sctx->tes_shader) { - r = si_update_scratch_buffer(sctx, sctx->vs_shader); - if (r < 0) - return false; - if (r == 1) + r = si_update_scratch_buffer(sctx, sctx->vs_shader); + if (r < 0) + return false; + if (r == 1) { + if (sctx->tes_shader) si_pm4_bind_state(sctx, ls, sctx->vs_shader->current->pm4); - } else if (sctx->gs_shader) { - r = si_update_scratch_buffer(sctx, sctx->vs_shader); - if (r < 0) - return false; - if (r == 1) + else if (sctx->gs_shader) si_pm4_bind_state(sctx, es, sctx->vs_shader->current->pm4); - } else { - r = si_update_scratch_buffer(sctx, sctx->vs_shader); - if (r < 0) - return false; - if (r == 1) + else si_pm4_bind_state(sctx, vs, sctx->vs_shader->current->pm4); } /* TES can be bound as ES or VS. */ - if (sctx->gs_shader) { - r = si_update_scratch_buffer(sctx, sctx->tes_shader); - if (r < 0) - return false; - if (r == 1) + r = si_update_scratch_buffer(sctx, sctx->tes_shader); + if (r < 0) + return false; + if (r == 1) { + if (sctx->gs_shader) si_pm4_bind_state(sctx, es, sctx->tes_shader->current->pm4); - } else { - r = si_update_scratch_buffer(sctx, sctx->tes_shader); - if (r < 0) - return false; - if (r == 1) + else si_pm4_bind_state(sctx, vs, sctx->tes_shader->current->pm4); } } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/10] tgsi: move pipe_shader_from_tgsi_processor function to util
From: Marek Olšák--- src/gallium/auxiliary/tgsi/tgsi_ureg.c | 26 ++ src/gallium/auxiliary/util/u_inlines.h | 22 ++ 2 files changed, 24 insertions(+), 24 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c index 3d21319..f2f5181 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c @@ -35,6 +35,7 @@ #include "tgsi/tgsi_dump.h" #include "tgsi/tgsi_sanity.h" #include "util/u_debug.h" +#include "util/u_inlines.h" #include "util/u_memory.h" #include "util/u_math.h" #include "util/u_bitmask.h" @@ -1830,29 +1831,6 @@ void ureg_free_tokens( const struct tgsi_token *tokens ) } -static inline unsigned -pipe_shader_from_tgsi_processor(unsigned processor) -{ - switch (processor) { - case TGSI_PROCESSOR_VERTEX: - return PIPE_SHADER_VERTEX; - case TGSI_PROCESSOR_TESS_CTRL: - return PIPE_SHADER_TESS_CTRL; - case TGSI_PROCESSOR_TESS_EVAL: - return PIPE_SHADER_TESS_EVAL; - case TGSI_PROCESSOR_GEOMETRY: - return PIPE_SHADER_GEOMETRY; - case TGSI_PROCESSOR_FRAGMENT: - return PIPE_SHADER_FRAGMENT; - case TGSI_PROCESSOR_COMPUTE: - return PIPE_SHADER_COMPUTE; - default: - assert(0); - return PIPE_SHADER_VERTEX; - } -} - - struct ureg_program * ureg_create(unsigned processor) { @@ -1872,7 +1850,7 @@ ureg_create_with_screen(unsigned processor, struct pipe_screen *screen) ureg->supports_any_inout_decl_range = screen && screen->get_shader_param(screen, - pipe_shader_from_tgsi_processor(processor), + util_pipe_shader_from_tgsi_processor(processor), PIPE_SHADER_CAP_TGSI_ANY_INOUT_DECL_RANGE) != 0; for (i = 0; i < Elements(ureg->properties); i++) diff --git a/src/gallium/auxiliary/util/u_inlines.h b/src/gallium/auxiliary/util/u_inlines.h index bb99a02..384e267 100644 --- a/src/gallium/auxiliary/util/u_inlines.h +++ b/src/gallium/auxiliary/util/u_inlines.h @@ -651,6 +651,28 @@ util_max_layer(const struct pipe_resource *r, unsigned level) } } +static inline unsigned +util_pipe_shader_from_tgsi_processor(unsigned processor) +{ + switch (processor) { + case TGSI_PROCESSOR_VERTEX: + return PIPE_SHADER_VERTEX; + case TGSI_PROCESSOR_TESS_CTRL: + return PIPE_SHADER_TESS_CTRL; + case TGSI_PROCESSOR_TESS_EVAL: + return PIPE_SHADER_TESS_EVAL; + case TGSI_PROCESSOR_GEOMETRY: + return PIPE_SHADER_GEOMETRY; + case TGSI_PROCESSOR_FRAGMENT: + return PIPE_SHADER_FRAGMENT; + case TGSI_PROCESSOR_COMPUTE: + return PIPE_SHADER_COMPUTE; + default: + assert(0); + return PIPE_SHADER_VERTEX; + } +} + #ifdef __cplusplus } #endif -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/10] radeonsi: remove an unused ctx parameter in si_shader_destroy
From: Marek Olšák--- src/gallium/drivers/radeonsi/si_compute.c | 4 ++-- src/gallium/drivers/radeonsi/si_shader.c| 4 ++-- src/gallium/drivers/radeonsi/si_shader.h| 2 +- src/gallium/drivers/radeonsi/si_state_shaders.c | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index c660534..697e60a 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -469,7 +469,7 @@ static void si_delete_compute_state(struct pipe_context *ctx, void* state){ if (program->kernels) { for (int i = 0; i < program->num_kernels; i++){ if (program->kernels[i].bo){ - si_shader_destroy(ctx, >kernels[i]); + si_shader_destroy(>kernels[i]); } } FREE(program->kernels); @@ -482,7 +482,7 @@ static void si_delete_compute_state(struct pipe_context *ctx, void* state){ FREE(program->shader.binary.config); FREE(program->shader.binary.rodata); FREE(program->shader.binary.global_symbol_offsets); - si_shader_destroy(ctx, >shader); + si_shader_destroy(>shader); #endif pipe_resource_reference( diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 789b1b7..0e98915 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -4190,10 +4190,10 @@ out: return r; } -void si_shader_destroy(struct pipe_context *ctx, struct si_shader *shader) +void si_shader_destroy(struct si_shader *shader) { if (shader->gs_copy_shader) - si_shader_destroy(ctx, shader->gs_copy_shader); + si_shader_destroy(shader->gs_copy_shader); if (shader->scratch_bo) r600_resource_reference(>scratch_bo, NULL); diff --git a/src/gallium/drivers/radeonsi/si_shader.h b/src/gallium/drivers/radeonsi/si_shader.h index b92fa02..460 100644 --- a/src/gallium/drivers/radeonsi/si_shader.h +++ b/src/gallium/drivers/radeonsi/si_shader.h @@ -324,7 +324,7 @@ int si_shader_create(struct si_screen *sscreen, LLVMTargetMachineRef tm, void si_dump_shader_key(unsigned shader, union si_shader_key *key, FILE *f); int si_compile_llvm(struct si_screen *sscreen, struct si_shader *shader, LLVMTargetMachineRef tm, LLVMModuleRef mod); -void si_shader_destroy(struct pipe_context *ctx, struct si_shader *shader); +void si_shader_destroy(struct si_shader *shader); unsigned si_shader_io_get_unique_index(unsigned semantic_name, unsigned index); int si_shader_binary_upload(struct si_screen *sscreen, struct si_shader *shader); int si_shader_binary_read(struct si_screen *sscreen, struct si_shader *shader); diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 2489101..9d05cb5 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -942,7 +942,7 @@ static void si_delete_shader_selector(struct pipe_context *ctx, break; } - si_shader_destroy(ctx, p); + si_shader_destroy(p); free(p); p = c; } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/10] radeonsi: print export_prim_id from the shader key
From: Marek Olšák--- src/gallium/drivers/radeonsi/si_shader.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 109a805..789b1b7 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -3974,6 +3974,7 @@ void si_dump_shader_key(unsigned shader, union si_shader_key *key, FILE *f) key->vs.es_enabled_outputs); fprintf(f, " as_es = %u\n", key->vs.as_es); fprintf(f, " as_ls = %u\n", key->vs.as_ls); + fprintf(f, " export_prim_id = %u\n", key->vs.export_prim_id); break; case PIPE_SHADER_TESS_CTRL: @@ -3985,6 +3986,7 @@ void si_dump_shader_key(unsigned shader, union si_shader_key *key, FILE *f) fprintf(f, " es_enabled_outputs = 0x%"PRIx64"\n", key->tes.es_enabled_outputs); fprintf(f, " as_es = %u\n", key->tes.as_es); + fprintf(f, " export_prim_id = %u\n", key->tes.export_prim_id); break; case PIPE_SHADER_GEOMETRY: -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/10] radeonsi: unify shader delete functions
From: Marek Olšák--- src/gallium/drivers/radeonsi/si_state_shaders.c | 84 + 1 file changed, 17 insertions(+), 67 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 9d05cb5..cc053bb 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -907,11 +907,21 @@ static void si_bind_ps_shader(struct pipe_context *ctx, void *state) si_mark_atom_dirty(sctx, >cb_target_mask); } -static void si_delete_shader_selector(struct pipe_context *ctx, - struct si_shader_selector *sel) +static void si_delete_shader_selector(struct pipe_context *ctx, void *state) { struct si_context *sctx = (struct si_context *)ctx; + struct si_shader_selector *sel = (struct si_shader_selector *)state; struct si_shader *p = sel->current, *c; + struct si_shader_selector **current_shader[SI_NUM_SHADERS] = { + [PIPE_SHADER_VERTEX] = >vs_shader, + [PIPE_SHADER_TESS_CTRL] = >tcs_shader, + [PIPE_SHADER_TESS_EVAL] = >tes_shader, + [PIPE_SHADER_GEOMETRY] = >gs_shader, + [PIPE_SHADER_FRAGMENT] = >ps_shader, + }; + + if (*current_shader[sel->type] == sel) + *current_shader[sel->type] = NULL; while (p) { c = p->next_variant; @@ -951,66 +961,6 @@ static void si_delete_shader_selector(struct pipe_context *ctx, free(sel); } -static void si_delete_vs_shader(struct pipe_context *ctx, void *state) -{ - struct si_context *sctx = (struct si_context *)ctx; - struct si_shader_selector *sel = (struct si_shader_selector *)state; - - if (sctx->vs_shader == sel) { - sctx->vs_shader = NULL; - } - - si_delete_shader_selector(ctx, sel); -} - -static void si_delete_gs_shader(struct pipe_context *ctx, void *state) -{ - struct si_context *sctx = (struct si_context *)ctx; - struct si_shader_selector *sel = (struct si_shader_selector *)state; - - if (sctx->gs_shader == sel) { - sctx->gs_shader = NULL; - } - - si_delete_shader_selector(ctx, sel); -} - -static void si_delete_ps_shader(struct pipe_context *ctx, void *state) -{ - struct si_context *sctx = (struct si_context *)ctx; - struct si_shader_selector *sel = (struct si_shader_selector *)state; - - if (sctx->ps_shader == sel) { - sctx->ps_shader = NULL; - } - - si_delete_shader_selector(ctx, sel); -} - -static void si_delete_tcs_shader(struct pipe_context *ctx, void *state) -{ - struct si_context *sctx = (struct si_context *)ctx; - struct si_shader_selector *sel = (struct si_shader_selector *)state; - - if (sctx->tcs_shader == sel) { - sctx->tcs_shader = NULL; - } - - si_delete_shader_selector(ctx, sel); -} - -static void si_delete_tes_shader(struct pipe_context *ctx, void *state) -{ - struct si_context *sctx = (struct si_context *)ctx; - struct si_shader_selector *sel = (struct si_shader_selector *)state; - - if (sctx->tes_shader == sel) { - sctx->tes_shader = NULL; - } - - si_delete_shader_selector(ctx, sel); -} - static void si_emit_spi_map(struct si_context *sctx, struct r600_atom *atom) { struct radeon_winsys_cs *cs = sctx->b.rings.gfx.cs; @@ -1675,9 +1625,9 @@ void si_init_shader_functions(struct si_context *sctx) sctx->b.b.bind_gs_state = si_bind_gs_shader; sctx->b.b.bind_fs_state = si_bind_ps_shader; - sctx->b.b.delete_vs_state = si_delete_vs_shader; - sctx->b.b.delete_tcs_state = si_delete_tcs_shader; - sctx->b.b.delete_tes_state = si_delete_tes_shader; - sctx->b.b.delete_gs_state = si_delete_gs_shader; - sctx->b.b.delete_fs_state = si_delete_ps_shader; + sctx->b.b.delete_vs_state = si_delete_shader_selector; + sctx->b.b.delete_tcs_state = si_delete_shader_selector; + sctx->b.b.delete_tes_state = si_delete_shader_selector; + sctx->b.b.delete_gs_state = si_delete_shader_selector; + sctx->b.b.delete_fs_state = si_delete_shader_selector; } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/10] radeonsi: cleanup si_llvm_init_export_args
From: Marek Olšák--- src/gallium/drivers/radeonsi/si_shader.c | 76 ++-- 1 file changed, 34 insertions(+), 42 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 32a702f..109a805 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -1306,6 +1306,22 @@ static void si_llvm_init_export_args(struct lp_build_tgsi_context *bld_base, unsigned compressed = 0; unsigned chan; + /* XXX: This controls which components of the output +* registers actually get exported. (e.g bit 0 means export +* X component, bit 1 means export Y component, etc.) I'm +* hard coding this to 0xf for now. In the future, we might +* want to do something else. */ + args[0] = lp_build_const_int32(base->gallivm, 0xf); + + /* Specify whether the EXEC mask represents the valid mask */ + args[1] = uint->zero; + + /* Specify whether this is the last export */ + args[2] = uint->zero; + + /* Specify the target we are exporting */ + args[3] = lp_build_const_int32(base->gallivm, target); + if (si_shader_ctx->type == TGSI_PROCESSOR_FRAGMENT) { int cbuf = target - V_008DFC_SQ_EXP_MRT; @@ -1323,55 +1339,31 @@ static void si_llvm_init_export_args(struct lp_build_tgsi_context *bld_base, } } + /* Set COMPR flag */ + args[4] = compressed ? uint->one : uint->zero; + if (compressed) { /* Pixel shader needs to pack output values before export */ - for (chan = 0; chan < 2; chan++ ) { - args[0] = values[2 * chan]; - args[1] = values[2 * chan + 1]; - args[chan + 5] = - lp_build_intrinsic(base->gallivm->builder, - "llvm.SI.packf16", - LLVMInt32TypeInContext(base->gallivm->context), - args, 2, - LLVMReadNoneAttribute | LLVMNoUnwindAttribute); + for (chan = 0; chan < 2; chan++) { + LLVMValueRef pack_args[2] = { + values[2 * chan], + values[2 * chan + 1] + }; + LLVMValueRef packed; + + packed = lp_build_intrinsic(base->gallivm->builder, + "llvm.SI.packf16", + LLVMInt32TypeInContext(base->gallivm->context), + pack_args, 2, + LLVMReadNoneAttribute | LLVMNoUnwindAttribute); args[chan + 7] = args[chan + 5] = LLVMBuildBitCast(base->gallivm->builder, -args[chan + 5], +packed, LLVMFloatTypeInContext(base->gallivm->context), ""); } - - /* Set COMPR flag */ - args[4] = uint->one; - } else { - for (chan = 0; chan < 4; chan++ ) - /* +5 because the first output value will be -* the 6th argument to the intrinsic. */ - args[chan + 5] = values[chan]; - - /* Clear COMPR flag */ - args[4] = uint->zero; - } - - /* XXX: This controls which components of the output -* registers actually get exported. (e.g bit 0 means export -* X component, bit 1 means export Y component, etc.) I'm -* hard coding this to 0xf for now. In the future, we might -* want to do something else. */ - args[0] = lp_build_const_int32(base->gallivm, 0xf); - - /* Specify whether the EXEC mask represents the valid mask */ - args[1] = uint->zero; - - /* Specify whether this is the last export */ - args[2] = uint->zero; - - /* Specify the target we are exporting */ - args[3] = lp_build_const_int32(base->gallivm, target); - - /* XXX: We probably need to keep track of the output -* values, so we know what we are passing to the next -* stage. */ + } else + memcpy([5], values, sizeof(values[0]) * 4); } /* Load from output pointers and initialize arguments for the shader export intrinsic */ -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/10] RadeonSI cleanups
Nothing special here other than cleanups. One patch disables NaNs for LS and HS, and there's also one GS shader leak fix. Please review. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/10] radeonsi: disable NaNs for LS and HS
From: Marek OlšákThey're disabled for all other shaders except compute, but I forgot to do this for tess stages. --- src/gallium/drivers/radeonsi/si_state_shaders.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index f673388..2489101 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -122,7 +122,8 @@ static void si_shader_ls(struct si_shader *shader) shader->ls_rsrc1 = S_00B528_VGPRS((shader->num_vgprs - 1) / 4) | S_00B528_SGPRS((num_sgprs - 1) / 8) | - S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt); + S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) | + S_00B528_DX10_CLAMP(shader->dx10_clamp_mode); shader->ls_rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) | S_00B52C_SCRATCH_EN(shader->scratch_bytes_per_wave > 0); } @@ -154,7 +155,8 @@ static void si_shader_hs(struct si_shader *shader) si_pm4_set_reg(pm4, R_00B424_SPI_SHADER_PGM_HI_HS, va >> 40); si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS, S_00B428_VGPRS((shader->num_vgprs - 1) / 4) | - S_00B428_SGPRS((num_sgprs - 1) / 8)); + S_00B428_SGPRS((num_sgprs - 1) / 8) | + S_00B428_DX10_CLAMP(shader->dx10_clamp_mode)); si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS, S_00B42C_USER_SGPR(num_user_sgprs) | S_00B42C_SCRATCH_EN(shader->scratch_bytes_per_wave > 0)); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/10] radeonsi: fix a GS copy shader leak
From: Marek OlšákCc: mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/radeonsi/si_shader.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 0e98915..012d708 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -4192,8 +4192,10 @@ out: void si_shader_destroy(struct si_shader *shader) { - if (shader->gs_copy_shader) + if (shader->gs_copy_shader) { si_shader_destroy(shader->gs_copy_shader); + FREE(shader->gs_copy_shader); + } if (shader->scratch_bo) r600_resource_reference(>scratch_bo, NULL); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/10] radeonsi: cleanup other scratch buffer functions
From: Marek Olšák--- src/gallium/drivers/radeonsi/si_state_shaders.c | 23 --- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 9395c31..71349a5 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -1205,30 +1205,23 @@ static int si_update_scratch_buffer(struct si_context *sctx, static unsigned si_get_current_scratch_buffer_size(struct si_context *sctx) { - if (!sctx->scratch_buffer) - return 0; - - return sctx->scratch_buffer->b.b.width0; + return sctx->scratch_buffer ? sctx->scratch_buffer->b.b.width0 : 0; } -static unsigned si_get_scratch_buffer_bytes_per_wave(struct si_context *sctx, - struct si_shader_selector *sel) +static unsigned si_get_scratch_buffer_bytes_per_wave(struct si_shader_selector *sel) { - if (!sel) - return 0; - - return sel->current->scratch_bytes_per_wave; + return sel ? sel->current->scratch_bytes_per_wave : 0; } static unsigned si_get_max_scratch_bytes_per_wave(struct si_context *sctx) { unsigned bytes = 0; - bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx, sctx->ps_shader)); - bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx, sctx->gs_shader)); - bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx, sctx->vs_shader)); - bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx, sctx->tcs_shader)); - bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx, sctx->tes_shader)); + bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx->ps_shader)); + bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx->gs_shader)); + bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx->vs_shader)); + bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx->tcs_shader)); + bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx->tes_shader)); return bytes; } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/10] radeonsi: unify shader create functions
From: Marek OlšákThe shader specifies the processor type, so use that instead. --- src/gallium/drivers/radeonsi/si_state_shaders.c | 49 + 1 file changed, 9 insertions(+), 40 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index cc053bb..c1d61d5 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -647,9 +647,8 @@ static int si_shader_select(struct pipe_context *ctx, return 0; } -static void *si_create_shader_state(struct pipe_context *ctx, - const struct pipe_shader_state *state, - unsigned pipe_shader_type) +static void *si_create_shader_selector(struct pipe_context *ctx, + const struct pipe_shader_state *state) { struct si_screen *sscreen = (struct si_screen *)ctx->screen; struct si_shader_selector *sel = CALLOC_STRUCT(si_shader_selector); @@ -658,7 +657,6 @@ static void *si_create_shader_state(struct pipe_context *ctx, if (!sel) return NULL; - sel->type = pipe_shader_type; sel->tokens = tgsi_dup_tokens(state->tokens); if (!sel->tokens) { FREE(sel); @@ -667,6 +665,7 @@ static void *si_create_shader_state(struct pipe_context *ctx, sel->so = state->stream_output; tgsi_scan_shader(state->tokens, >info); + sel->type = util_pipe_shader_from_tgsi_processor(sel->info.processor); p_atomic_inc(>b.num_shaders_created); /* First set which opcode uses which (i,j) pair. */ @@ -697,7 +696,7 @@ static void *si_create_shader_state(struct pipe_context *ctx, sel->info.uses_linear_centroid + sel->info.uses_linear_sample >= 2; - switch (pipe_shader_type) { + switch (sel->type) { case PIPE_SHADER_GEOMETRY: sel->gs_output_prim = sel->info.properties[TGSI_PROPERTY_GS_OUTPUT_PRIM]; @@ -763,36 +762,6 @@ static void *si_create_shader_state(struct pipe_context *ctx, return sel; } -static void *si_create_fs_state(struct pipe_context *ctx, - const struct pipe_shader_state *state) -{ - return si_create_shader_state(ctx, state, PIPE_SHADER_FRAGMENT); -} - -static void *si_create_gs_state(struct pipe_context *ctx, - const struct pipe_shader_state *state) -{ - return si_create_shader_state(ctx, state, PIPE_SHADER_GEOMETRY); -} - -static void *si_create_vs_state(struct pipe_context *ctx, - const struct pipe_shader_state *state) -{ - return si_create_shader_state(ctx, state, PIPE_SHADER_VERTEX); -} - -static void *si_create_tcs_state(struct pipe_context *ctx, -const struct pipe_shader_state *state) -{ - return si_create_shader_state(ctx, state, PIPE_SHADER_TESS_CTRL); -} - -static void *si_create_tes_state(struct pipe_context *ctx, -const struct pipe_shader_state *state) -{ - return si_create_shader_state(ctx, state, PIPE_SHADER_TESS_EVAL); -} - /** * Normally, we only emit 1 viewport and 1 scissor if no shader is using * the VIEWPORT_INDEX output, and emitting the other viewports and scissors @@ -1613,11 +1582,11 @@ void si_init_shader_functions(struct si_context *sctx) si_init_atom(sctx, >spi_map, >atoms.s.spi_map, si_emit_spi_map); si_init_atom(sctx, >spi_ps_input, >atoms.s.spi_ps_input, si_emit_spi_ps_input); - sctx->b.b.create_vs_state = si_create_vs_state; - sctx->b.b.create_tcs_state = si_create_tcs_state; - sctx->b.b.create_tes_state = si_create_tes_state; - sctx->b.b.create_gs_state = si_create_gs_state; - sctx->b.b.create_fs_state = si_create_fs_state; + sctx->b.b.create_vs_state = si_create_shader_selector; + sctx->b.b.create_tcs_state = si_create_shader_selector; + sctx->b.b.create_tes_state = si_create_shader_selector; + sctx->b.b.create_gs_state = si_create_shader_selector; + sctx->b.b.create_fs_state = si_create_shader_selector; sctx->b.b.bind_vs_state = si_bind_vs_shader; sctx->b.b.bind_tcs_state = si_bind_tcs_shader; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/10] radeonsi: don't use the AMDGPU intrinsic for CMP
From: Marek OlšákThe increase in VGPRs in unfortunate, but the decrease in the scratch size is always welcome. Totals: SGPRS: 344552 -> 344368 (-0.05 %) VGPRS: 197132 -> 197552 (0.21 %) Code Size: 7375376 -> 7366304 (-0.12 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1679360 -> 1615872 (-3.78 %) bytes per wave Totals from affected shaders: SGPRS: 47736 -> 47552 (-0.39 %) VGPRS: 27952 -> 28372 (1.50 %) Code Size: 1392724 -> 1383652 (-0.65 %) bytes LDS: 39 -> 39 (0.00 %) blocks Scratch: 513024 -> 449536 (-12.38 %) bytes per wave --- .../drivers/radeon/radeon_setup_tgsi_llvm.c| 31 +++--- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index c22ea7c..ac99e73 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -919,7 +919,21 @@ static void emit_ucmp( LLVMBuildSelect(builder, v, emit_data->args[1], emit_data->args[2], ""); } -static void emit_cmp( +static void emit_cmp(const struct lp_build_tgsi_action *action, +struct lp_build_tgsi_context *bld_base, +struct lp_build_emit_data *emit_data) +{ + LLVMBuilderRef builder = bld_base->base.gallivm->builder; + LLVMValueRef cond, *args = emit_data->args; + + cond = LLVMBuildFCmp(builder, LLVMRealOLT, args[0], +bld_base->base.zero, ""); + + emit_data->output[emit_data->chan] = + LLVMBuildSelect(builder, cond, args[1], args[2], ""); +} + +static void emit_set_cond( const struct lp_build_tgsi_action *action, struct lp_build_tgsi_context * bld_base, struct lp_build_emit_data * emit_data) @@ -1503,8 +1517,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx) bld_base->op_actions[TGSI_OPCODE_CEIL].intr_name = "llvm.ceil.f32"; bld_base->op_actions[TGSI_OPCODE_CLAMP].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_CLAMP].intr_name = "llvm.AMDIL.clamp."; - bld_base->op_actions[TGSI_OPCODE_CMP].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_CMP].intr_name = "llvm.AMDGPU.cndlt"; + bld_base->op_actions[TGSI_OPCODE_CMP].emit = emit_cmp; bld_base->op_actions[TGSI_OPCODE_CONT].emit = cont_emit; bld_base->op_actions[TGSI_OPCODE_COS].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_COS].intr_name = "llvm.cos.f32"; @@ -1573,13 +1586,13 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx) bld_base->op_actions[TGSI_OPCODE_ROUND].intr_name = "llvm.rint.f32"; bld_base->op_actions[TGSI_OPCODE_RSQ].intr_name = "llvm.AMDGPU.rsq.clamped.f32"; bld_base->op_actions[TGSI_OPCODE_RSQ].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_SGE].emit = emit_cmp; - bld_base->op_actions[TGSI_OPCODE_SEQ].emit = emit_cmp; + bld_base->op_actions[TGSI_OPCODE_SGE].emit = emit_set_cond; + bld_base->op_actions[TGSI_OPCODE_SEQ].emit = emit_set_cond; bld_base->op_actions[TGSI_OPCODE_SHL].emit = emit_shl; - bld_base->op_actions[TGSI_OPCODE_SLE].emit = emit_cmp; - bld_base->op_actions[TGSI_OPCODE_SLT].emit = emit_cmp; - bld_base->op_actions[TGSI_OPCODE_SNE].emit = emit_cmp; - bld_base->op_actions[TGSI_OPCODE_SGT].emit = emit_cmp; + bld_base->op_actions[TGSI_OPCODE_SLE].emit = emit_set_cond; + bld_base->op_actions[TGSI_OPCODE_SLT].emit = emit_set_cond; + bld_base->op_actions[TGSI_OPCODE_SNE].emit = emit_set_cond; + bld_base->op_actions[TGSI_OPCODE_SGT].emit = emit_set_cond; bld_base->op_actions[TGSI_OPCODE_SIN].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_SIN].intr_name = "llvm.sin.f32"; bld_base->op_actions[TGSI_OPCODE_SQRT].emit = build_tgsi_intrinsic_nomem; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/10] radeonsi: really enable the no-nans-fp-math option
From: Marek OlšákInclude compute shaders too, which includes OpenGL, but not OpenCL. LLVM doesn't use this much according to shader-db: Totals: SGPRS: 344944 -> 344944 (0.00 %) VGPRS: 197024 -> 197024 (0.00 %) Code Size: 7325688 -> 7325624 (-0.00 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1510400 -> 1510400 (0.00 %) bytes per wave Totals from affected shaders: SGPRS: 664 -> 664 (0.00 %) VGPRS: 480 -> 480 (0.00 %) Code Size: 25356 -> 25292 (-0.25 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave --- src/gallium/drivers/radeonsi/si_shader.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 8da2f77..aa4cfa0 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -3587,7 +3587,7 @@ static void create_function(struct si_shader_context *si_shader_ctx) if (shader->dx10_clamp_mode) LLVMAddTargetDependentFunctionAttr(si_shader_ctx->radeon_bld.main_fn, - "enable-no-nans-fp-math", "true"); + "no-nans-fp-math", "true"); for (i = 0; i <= last_sgpr; ++i) { LLVMValueRef P = LLVMGetParam(si_shader_ctx->radeon_bld.main_fn, i); @@ -4095,8 +4095,7 @@ int si_shader_create(struct si_screen *sscreen, LLVMTargetMachineRef tm, radeon_llvm_context_init(_shader_ctx.radeon_bld); bld_base = _shader_ctx.radeon_bld.soa.bld_base; - if (sel->type != PIPE_SHADER_COMPUTE) - shader->dx10_clamp_mode = true; + shader->dx10_clamp_mode = true; if (sel->info.uses_kill) shader->db_shader_control |= S_02880C_KILL_ENABLE(1); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/10] RadeonSI: Better LLVM IR generation
Hi, This patch series improves IR generation for radeonsi. Most of it removes uses of AMDGPU intrinsics. There is one piglit regression caused by aggressive handling of "undef" in LLVM, breaking piglit/glsl-routing. I have a lit test which I'll send later. Complete stats from shader-db are below. Decreasing scratch usage is certainly nice, but there is not much else. 7063 shaders Totals: SGPRS: 345216 -> 344944 (-0.08 %) VGPRS: 197684 -> 197024 (-0.33 %) Code Size: 7390408 -> 7325624 (-0.88 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1842176 -> 1510400 (-18.01 %) bytes per wave Totals from affected shaders: SGPRS: 229736 -> 229464 (-0.12 %) VGPRS: 130668 -> 130008 (-0.51 %) Code Size: 5554088 -> 5489304 (-1.17 %) bytes LDS: 56 -> 56 (0.00 %) blocks Scratch: 1764352 -> 1432576 (-18.80 %) bytes per wave Increases: SGPRS: 485 (0.07 %) VGPRS: 508 (0.07 %) Code Size: 1355 (0.19 %) LDS: 0 (0.00 %) Scratch: 65 (0.01 %) Decreases: SGPRS: 462 (0.07 %) VGPRS: 631 (0.09 %) Code Size: 2226 (0.32 %) LDS: 0 (0.00 %) Scratch: 137 (0.02 %) *** BY PERCENTAGE *** Max Increase: SGPRS: 40 -> 104 (160.00 %) (lines 10083 -> 10083) VGPRS: 4 -> 8 (100.00 %) (lines 19667 -> 19667) Code Size: 388 -> 444 (14.43 %) (lines 32696 -> 32696) bytes LDS: 0 -> 0 (0.00 %) (lines -1 -> -1) blocks Scratch: 1024 -> 12288 (1100.00 %) (lines 29684 -> 29684) bytes per wave Max Decrease: SGPRS: 80 -> 32 (-60.00 %) (lines 3125 -> 3125) VGPRS: 36 -> 16 (-55.56 %) (lines 18113 -> 18113) Code Size: 2548 -> 1160 (-54.47 %) (lines 18617 -> 18617) bytes LDS: 0 -> 0 (0.00 %) (lines -1 -> -1) blocks Scratch: 3072 -> 0 (-100.00 %) (lines 1522 -> 1522) bytes per wave *** BY UNIT *** Max Increase: SGPRS: 40 -> 104 (160.00 %) (lines 10083 -> 10083) VGPRS: 76 -> 92 (21.05 %) (lines 528 -> 528) Code Size: 3064 -> 3336 (8.88 %) (lines 29684 -> 29684) bytes LDS: 0 -> 0 (0.00 %) (lines -1 -> -1) blocks Scratch: 1024 -> 12288 (1100.00 %) (lines 29684 -> 29684) bytes per wave Max Decrease: SGPRS: 80 -> 32 (-60.00 %) (lines 3125 -> 3125) VGPRS: 156 -> 124 (-20.51 %) (lines 29866 -> 29866) Code Size: 2940 -> 1408 (-52.11 %) (lines 17413 -> 17413) bytes LDS: 0 -> 0 (0.00 %) (lines -1 -> -1) blocks Scratch: 14336 -> 0 (-100.00 %) (lines 30496 -> 30496) bytes per wave Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/10] radeonsi: re-enable unsafe-fp-math for LLVM 3.8
From: Marek OlšákRequired for 1/sqrt ==> rsq. We should finally fix the hang instead of running away from the issue. This assumes the bug is in LLVM and we have time to fix it before the release. Include compute shaders as well, which only affects TGSI and thus OpenGL. Totals: SGPRS: 344368 -> 345104 (0.21 %) VGPRS: 197552 -> 197420 (-0.07 %) Code Size: 7366304 -> 7324692 (-0.56 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1615872 -> 1524736 (-5.64 %) bytes per wave Totals from affected shaders: SGPRS: 146696 -> 147432 (0.50 %) VGPRS: 87212 -> 87080 (-0.15 %) Code Size: 3852664 -> 3811052 (-1.08 %) bytes LDS: 48 -> 48 (0.00 %) blocks Scratch: 1179648 -> 1088512 (-7.73 %) bytes per wave --- src/gallium/drivers/radeon/radeon_llvm_emit.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c b/src/gallium/drivers/radeon/radeon_llvm_emit.c index 6b2ebde..4bda4a4 100644 --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c @@ -84,6 +84,13 @@ void radeon_llvm_shader_type(LLVMValueRef F, unsigned type) sprintf(Str, "%1d", llvm_type); LLVMAddTargetDependentFunctionAttr(F, "ShaderType", Str); + +#if HAVE_LLVM >= 0x0308 + /* This only affects TGSI (OpenGL), so it's okay to set it for +* compute shaders too. +*/ + LLVMAddTargetDependentFunctionAttr(F, "unsafe-fp-math", "true"); +#endif } static void init_r600_target() -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/10] radeonsi: don't emit AMDGPU intrinsics for RSQ opcodes
From: Marek OlšákIntel and Nouveau use IEEE opcodes, so we should too. If there is a bug caused by not using the clamped RSQ variant, there must be another way to fix it. I don't think the RSQ behavior matters much now that NaNs are disabled. Nine and Wine should implement necessary workarounds for DX9 games. (they probably already do) Not many shaders are affected. Totals: SGPRS: 345104 -> 344944 (-0.05 %) VGPRS: 197420 -> 197024 (-0.20 %) Code Size: 7324692 -> 7325688 (0.01 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1524736 -> 1510400 (-0.94 %) bytes per wave Totals from affected shaders: SGPRS: 25160 -> 25000 (-0.64 %) VGPRS: 17336 -> 16940 (-2.28 %) Code Size: 843412 -> 844408 (0.12 %) bytes LDS: 6 -> 6 (0.00 %) blocks Scratch: 139264 -> 124928 (-10.29 %) bytes per wave --- .../drivers/radeon/radeon_setup_tgsi_llvm.c| 28 ++ 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index ac99e73..1172244 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -1452,6 +1452,28 @@ static void emit_minmax_int(const struct lp_build_tgsi_action *action, emit_data->args[1], ""); } +/* This requires "unsafe-fp-math" for LLVM to convert it to RSQ. */ +static void emit_rsq(const struct lp_build_tgsi_action *action, +struct lp_build_tgsi_context *bld_base, +struct lp_build_emit_data *emit_data) +{ + LLVMBuilderRef builder = bld_base->base.gallivm->builder; + LLVMValueRef src = emit_data->args[0]; + bool is_f64 = LLVMGetTypeKind(LLVMTypeOf(src)) == LLVMDoubleTypeKind; + + LLVMValueRef sqrt = + lp_build_emit_llvm_unary(bld_base, +is_f64 ? TGSI_OPCODE_DSQRT + : TGSI_OPCODE_SQRT, +src); + + emit_data->output[emit_data->chan] = + LLVMBuildFDiv(builder, + is_f64 ? bld_base->dbl_bld.one +: bld_base->base.one, + sqrt, ""); +} + void radeon_llvm_context_init(struct radeon_llvm_context * ctx) { struct lp_type type; @@ -1531,8 +1553,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx) bld_base->op_actions[TGSI_OPCODE_DSGE].emit = emit_dcmp; bld_base->op_actions[TGSI_OPCODE_DSLT].emit = emit_dcmp; bld_base->op_actions[TGSI_OPCODE_DSNE].emit = emit_dcmp; - bld_base->op_actions[TGSI_OPCODE_DRSQ].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_DRSQ].intr_name = "llvm.AMDGPU.rsq.f64"; + bld_base->op_actions[TGSI_OPCODE_DRSQ].emit = emit_rsq; bld_base->op_actions[TGSI_OPCODE_DSQRT].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_DSQRT].intr_name = "llvm.sqrt.f64"; bld_base->op_actions[TGSI_OPCODE_ELSE].emit = else_emit; @@ -1584,8 +1605,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx) bld_base->op_actions[TGSI_OPCODE_POW].intr_name = "llvm.pow.f32"; bld_base->op_actions[TGSI_OPCODE_ROUND].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_ROUND].intr_name = "llvm.rint.f32"; - bld_base->op_actions[TGSI_OPCODE_RSQ].intr_name = "llvm.AMDGPU.rsq.clamped.f32"; - bld_base->op_actions[TGSI_OPCODE_RSQ].emit = build_tgsi_intrinsic_nomem; + bld_base->op_actions[TGSI_OPCODE_RSQ].emit = emit_rsq; bld_base->op_actions[TGSI_OPCODE_SGE].emit = emit_set_cond; bld_base->op_actions[TGSI_OPCODE_SEQ].emit = emit_set_cond; bld_base->op_actions[TGSI_OPCODE_SHL].emit = emit_shl; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/10] radeonsi: disable NaNs for LS and HS
FWIW I'm still baffled by this shader bit. NaNs are absolutely required to be generated and handled as NaNs in shaders (albeit conversion to ints will make them 0) by DX10 (there's plenty of tests which actually check for this). And generally, you really want to generate NaNs for newer glsl versions too I think, albeit this may not be strictly required (of course, currently you can't distinguish this in tgsi, but particularly gs/ls/hs will always be newer glsl versions). So I'm REALLY wondering why there's a shader bit named that way... Roland Am 11.10.2015 um 03:11 schrieb Marek Olšák: > From: Marek Olšák> > They're disabled for all other shaders except compute, but I forgot > to do this for tess stages. > --- > src/gallium/drivers/radeonsi/si_state_shaders.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c > b/src/gallium/drivers/radeonsi/si_state_shaders.c > index f673388..2489101 100644 > --- a/src/gallium/drivers/radeonsi/si_state_shaders.c > +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c > @@ -122,7 +122,8 @@ static void si_shader_ls(struct si_shader *shader) > > shader->ls_rsrc1 = S_00B528_VGPRS((shader->num_vgprs - 1) / 4) | > S_00B528_SGPRS((num_sgprs - 1) / 8) | > -S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt); > +S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) | > +S_00B528_DX10_CLAMP(shader->dx10_clamp_mode); > shader->ls_rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) | > S_00B52C_SCRATCH_EN(shader->scratch_bytes_per_wave > > 0); > } > @@ -154,7 +155,8 @@ static void si_shader_hs(struct si_shader *shader) > si_pm4_set_reg(pm4, R_00B424_SPI_SHADER_PGM_HI_HS, va >> 40); > si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS, > S_00B428_VGPRS((shader->num_vgprs - 1) / 4) | > -S_00B428_SGPRS((num_sgprs - 1) / 8)); > +S_00B428_SGPRS((num_sgprs - 1) / 8) | > +S_00B428_DX10_CLAMP(shader->dx10_clamp_mode)); > si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS, > S_00B42C_USER_SGPR(num_user_sgprs) | > S_00B42C_SCRATCH_EN(shader->scratch_bytes_per_wave > 0)); > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space
On Sat, Oct 10, 2015 at 4:21 PM, Samuel Pitoisetwrote: > > > On 10/10/2015 09:58 PM, Ilia Mirkin wrote: >> >> On Sat, Oct 10, 2015 at 3:55 PM, Samuel Pitoiset >> wrote: >>> >>> >>> On 10/10/2015 09:42 PM, Ilia Mirkin wrote: On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset wrote: > > This patch looks fine except that it should be a bit more normalized. I > mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same > for > PUSH_SPACE calls, sometimes you add it sometimes not. Meh. We need to get our error checking situation straight, but this isn't the patch to do it in. >>> >>> >>> Yeah, but this needs to be clarified. >> >> What does? > > > I mean, we should either use PUSH_SPACE everywhere or not at all, and always > breaks (or not) when PUSH_SPACE fails. > That's really a minor issue. It's actually a major issue. Error-handling is practically non-existent. There are a couple of spots here and there, but it doesn't really scale up. I guess I (semi-)accidentally removed a couple of spots that error checked, but, again, meh. Doing this for real will require some careful thought. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/10] gallivm: implement the correct version of LRP
From: Marek OlšákThe previous version has precision issues. This can be a problem with tessellation. Sadly, I can't find the article where I read it anymore. I'm not sure if the unsafe-fp-math flag would be enough to revert this. --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c index 0ad78b0..512558b 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c @@ -538,12 +538,13 @@ lrp_emit( struct lp_build_tgsi_context * bld_base, struct lp_build_emit_data * emit_data) { - LLVMValueRef tmp; - tmp = lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_SUB, - emit_data->args[1], - emit_data->args[2]); - emit_data->output[emit_data->chan] = lp_build_emit_llvm_ternary(bld_base, -TGSI_OPCODE_MAD, emit_data->args[0], tmp, emit_data->args[2]); + struct lp_build_context *bld = _base->base; + LLVMValueRef inv, a, b; + + inv = lp_build_sub(bld, bld_base->base.one, emit_data->args[0]); + a = lp_build_mul(bld, emit_data->args[1], emit_data->args[0]); + b = lp_build_mul(bld, emit_data->args[2], inv); + emit_data->output[emit_data->chan] = lp_build_add(bld, a, b); } /* TGSI_OPCODE_MAD */ -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/10] radeonsi: initialize output, temp, and address registers to "undef"
From: Marek OlšákThis removes "v_mov v0, 0" which typically occurs before exports. Totals: SGPRS: 345216 -> 344552 (-0.19 %) VGPRS: 197684 -> 197132 (-0.28 %) Code Size: 7390408 -> 7375376 (-0.20 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1842176 -> 1679360 (-8.84 %) bytes per wave Totals from affected shaders: SGPRS: 101336 -> 100672 (-0.66 %) VGPRS: 53920 -> 53368 (-1.02 %) Code Size: 2170176 -> 2155144 (-0.69 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 1015808 -> 852992 (-16.03 %) bytes per wave --- src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 2e9a013..f548d1a 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -272,6 +272,15 @@ static LLVMValueRef fetch_system_value( return bitcast(bld_base, type, cval); } +static LLVMValueRef si_build_alloca_undef(struct gallivm_state *gallivm, + LLVMTypeRef type, + const char *name) +{ + LLVMValueRef ptr = lp_build_alloca(gallivm, type, name); + LLVMBuildStore(gallivm->builder, LLVMGetUndef(type), ptr); + return ptr; +} + static void emit_declaration( struct lp_build_tgsi_context * bld_base, const struct tgsi_full_declaration *decl) @@ -285,7 +294,7 @@ static void emit_declaration( for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) { unsigned chan; for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) { -ctx->soa.addr[idx][chan] = lp_build_alloca( +ctx->soa.addr[idx][chan] = si_build_alloca_undef( >gallivm, ctx->soa.bld_base.uint_bld.elem_type, ""); } @@ -315,8 +324,9 @@ static void emit_declaration( for (idx = first; idx <= last; idx++) { for (i = 0; i < TGSI_NUM_CHANNELS; i++) { ctx->temps[idx * TGSI_NUM_CHANNELS + i] = - lp_build_alloca(bld_base->base.gallivm, bld_base->base.vec_type, - "temp"); + si_build_alloca_undef(bld_base->base.gallivm, + bld_base->base.vec_type, + "temp"); } } break; @@ -347,7 +357,8 @@ static void emit_declaration( unsigned chan; assert(idx < RADEON_LLVM_MAX_OUTPUTS); for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) { - ctx->soa.outputs[idx][chan] = lp_build_alloca(>gallivm, + ctx->soa.outputs[idx][chan] = si_build_alloca_undef( + >gallivm, ctx->soa.bld_base.base.elem_type, ""); } } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/10] gallivm: supply correct opcode info to emit functions
From: Marek OlšákThis is useful only when emit functions use it. The new radeonsi min/max opcode implementation requires this. --- src/gallium/auxiliary/gallivm/lp_bld_tgsi.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c index c4ae304..c50d83e 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c @@ -114,12 +114,17 @@ lp_build_emit_llvm( struct lp_build_emit_data * emit_data) { struct lp_build_tgsi_action * action = _base->op_actions[tgsi_opcode]; + const struct tgsi_opcode_info *old_info = emit_data->info; /* XXX: Assert that this is a componentwise or replicate instruction */ lp_build_action_set_dst_type(emit_data, bld_base, tgsi_opcode); emit_data->chan = 0; + + /* Set and restore the opcode info. */ + emit_data->info = tgsi_get_opcode_info(tgsi_opcode); assert(action->emit); action->emit(action, bld_base, emit_data); + emit_data->info = old_info; return emit_data->output[0]; } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/10] radeonsi: use LRP from gallivm
From: Marek Olšák--- src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 23ea23a..c22ea7c 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -1561,8 +1561,6 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx) bld_base->op_actions[TGSI_OPCODE_LSB].emit = emit_lsb; bld_base->op_actions[TGSI_OPCODE_LG2].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_LG2].intr_name = "llvm.log2.f32"; - bld_base->op_actions[TGSI_OPCODE_LRP].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_LRP].intr_name = "llvm.AMDGPU.lrp"; bld_base->op_actions[TGSI_OPCODE_MOD].emit = emit_mod; bld_base->op_actions[TGSI_OPCODE_UMSB].emit = emit_umsb; bld_base->op_actions[TGSI_OPCODE_NOT].emit = emit_not; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/10] radeonsi: don't emit AMDGPU intrinsics for integer abs, min, max
From: Marek OlšákNo difference according to shader-db. (with the new S_ABS_I32 pattern) --- .../drivers/radeon/radeon_setup_tgsi_llvm.c| 60 ++ 1 file changed, 50 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index 91cf658..23ea23a 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -1393,6 +1393,51 @@ static void emit_imsb(const struct lp_build_tgsi_action * action, LLVMBuildSelect(builder, cond, all_ones, msb, ""); } +static void emit_iabs(const struct lp_build_tgsi_action *action, + struct lp_build_tgsi_context *bld_base, + struct lp_build_emit_data *emit_data) +{ + LLVMBuilderRef builder = bld_base->base.gallivm->builder; + + emit_data->output[emit_data->chan] = + lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_IMAX, + emit_data->args[0], + LLVMBuildNeg(builder, + emit_data->args[0], "")); +} + +static void emit_minmax_int(const struct lp_build_tgsi_action *action, + struct lp_build_tgsi_context *bld_base, + struct lp_build_emit_data *emit_data) +{ + LLVMBuilderRef builder = bld_base->base.gallivm->builder; + LLVMIntPredicate op; + + switch (emit_data->info->opcode) { + default: + assert(0); + case TGSI_OPCODE_IMAX: + op = LLVMIntSGT; + break; + case TGSI_OPCODE_IMIN: + op = LLVMIntSLT; + break; + case TGSI_OPCODE_UMAX: + op = LLVMIntUGT; + break; + case TGSI_OPCODE_UMIN: + op = LLVMIntULT; + break; + } + + emit_data->output[emit_data->chan] = + LLVMBuildSelect(builder, + LLVMBuildICmp(builder, op, emit_data->args[0], + emit_data->args[1], ""), + emit_data->args[0], + emit_data->args[1], ""); +} + void radeon_llvm_context_init(struct radeon_llvm_context * ctx) { struct lp_type type; @@ -1493,17 +1538,14 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx) bld_base->op_actions[TGSI_OPCODE_FSGE].emit = emit_fcmp; bld_base->op_actions[TGSI_OPCODE_FSLT].emit = emit_fcmp; bld_base->op_actions[TGSI_OPCODE_FSNE].emit = emit_fcmp; - bld_base->op_actions[TGSI_OPCODE_IABS].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_IABS].intr_name = "llvm.AMDIL.abs."; + bld_base->op_actions[TGSI_OPCODE_IABS].emit = emit_iabs; bld_base->op_actions[TGSI_OPCODE_IBFE].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_IBFE].intr_name = "llvm.AMDGPU.bfe.i32"; bld_base->op_actions[TGSI_OPCODE_IDIV].emit = emit_idiv; bld_base->op_actions[TGSI_OPCODE_IF].emit = if_emit; bld_base->op_actions[TGSI_OPCODE_UIF].emit = uif_emit; - bld_base->op_actions[TGSI_OPCODE_IMAX].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_IMAX].intr_name = "llvm.AMDGPU.imax"; - bld_base->op_actions[TGSI_OPCODE_IMIN].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_IMIN].intr_name = "llvm.AMDGPU.imin"; + bld_base->op_actions[TGSI_OPCODE_IMAX].emit = emit_minmax_int; + bld_base->op_actions[TGSI_OPCODE_IMIN].emit = emit_minmax_int; bld_base->op_actions[TGSI_OPCODE_IMSB].emit = emit_imsb; bld_base->op_actions[TGSI_OPCODE_INEG].emit = emit_ineg; bld_base->op_actions[TGSI_OPCODE_ISHR].emit = emit_ishr; @@ -1551,10 +1593,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx) bld_base->op_actions[TGSI_OPCODE_UBFE].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_UBFE].intr_name = "llvm.AMDGPU.bfe.u32"; bld_base->op_actions[TGSI_OPCODE_UDIV].emit = emit_udiv; - bld_base->op_actions[TGSI_OPCODE_UMAX].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_UMAX].intr_name = "llvm.AMDGPU.umax"; - bld_base->op_actions[TGSI_OPCODE_UMIN].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_UMIN].intr_name = "llvm.AMDGPU.umin"; + bld_base->op_actions[TGSI_OPCODE_UMAX].emit = emit_minmax_int; + bld_base->op_actions[TGSI_OPCODE_UMIN].emit = emit_minmax_int; bld_base->op_actions[TGSI_OPCODE_UMOD].emit = emit_umod; bld_base->op_actions[TGSI_OPCODE_USEQ].emit = emit_icmp; bld_base->op_actions[TGSI_OPCODE_USGE].emit = emit_icmp; -- 2.1.4
[Mesa-dev] [PATCH 04/10] radeonsi: don't emit AMDGPU intrinsics for EX2, ROUND, TRUNC
From: Marek OlšákNo difference according to shader-db. --- src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c index f548d1a..91cf658 100644 --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c @@ -1481,7 +1481,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx) bld_base->op_actions[TGSI_OPCODE_ENDIF].emit = endif_emit; bld_base->op_actions[TGSI_OPCODE_ENDLOOP].emit = endloop_emit; bld_base->op_actions[TGSI_OPCODE_EX2].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.AMDIL.exp."; + bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.exp2.f32"; bld_base->op_actions[TGSI_OPCODE_FLR].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_FLR].intr_name = "llvm.floor.f32"; bld_base->op_actions[TGSI_OPCODE_FMA].emit = build_tgsi_intrinsic_nomem; @@ -1530,7 +1530,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx) bld_base->op_actions[TGSI_OPCODE_POW].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_POW].intr_name = "llvm.pow.f32"; bld_base->op_actions[TGSI_OPCODE_ROUND].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_ROUND].intr_name = "llvm.AMDIL.round.nearest."; + bld_base->op_actions[TGSI_OPCODE_ROUND].intr_name = "llvm.rint.f32"; bld_base->op_actions[TGSI_OPCODE_RSQ].intr_name = "llvm.AMDGPU.rsq.clamped.f32"; bld_base->op_actions[TGSI_OPCODE_RSQ].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_SGE].emit = emit_cmp; @@ -1546,7 +1546,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context * ctx) bld_base->op_actions[TGSI_OPCODE_SQRT].intr_name = "llvm.sqrt.f32"; bld_base->op_actions[TGSI_OPCODE_SSG].emit = emit_ssg; bld_base->op_actions[TGSI_OPCODE_TRUNC].emit = build_tgsi_intrinsic_nomem; - bld_base->op_actions[TGSI_OPCODE_TRUNC].intr_name = "llvm.AMDGPU.trunc"; + bld_base->op_actions[TGSI_OPCODE_TRUNC].intr_name = "llvm.trunc.f32"; bld_base->op_actions[TGSI_OPCODE_UADD].emit = emit_uadd; bld_base->op_actions[TGSI_OPCODE_UBFE].emit = build_tgsi_intrinsic_nomem; bld_base->op_actions[TGSI_OPCODE_UBFE].intr_name = "llvm.AMDGPU.bfe.u32"; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/10] radeonsi: re-enable unsafe-fp-math for LLVM 3.8
FWIW, this isn't quite correct with ARB_shader_precision or GL4.1 -- it specifies that infinities should be correctly generated through division by 0, which unsafe-fp-math doesn't guarantee. At least, that's assuming this is similar to the "fast" per-instruction flag (http://llvm.org/docs/LangRef.html#fast-math-flags) which says "This flag implies all the others." On Sat, Oct 10, 2015 at 9:29 PM, Marek Olšákwrote: > From: Marek Olšák > > Required for 1/sqrt ==> rsq. > > We should finally fix the hang instead of running away from the issue. This > assumes the bug is in LLVM and we have time to fix it before the release. > Include compute shaders as well, which only affects TGSI and thus OpenGL. > > Totals: > SGPRS: 344368 -> 345104 (0.21 %) > VGPRS: 197552 -> 197420 (-0.07 %) > Code Size: 7366304 -> 7324692 (-0.56 %) bytes > LDS: 91 -> 91 (0.00 %) blocks > Scratch: 1615872 -> 1524736 (-5.64 %) bytes per wave > > Totals from affected shaders: > SGPRS: 146696 -> 147432 (0.50 %) > VGPRS: 87212 -> 87080 (-0.15 %) > Code Size: 3852664 -> 3811052 (-1.08 %) bytes > LDS: 48 -> 48 (0.00 %) blocks > Scratch: 1179648 -> 1088512 (-7.73 %) bytes per wave > --- > src/gallium/drivers/radeon/radeon_llvm_emit.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c > b/src/gallium/drivers/radeon/radeon_llvm_emit.c > index 6b2ebde..4bda4a4 100644 > --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c > +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c > @@ -84,6 +84,13 @@ void radeon_llvm_shader_type(LLVMValueRef F, unsigned type) > sprintf(Str, "%1d", llvm_type); > > LLVMAddTargetDependentFunctionAttr(F, "ShaderType", Str); > + > +#if HAVE_LLVM >= 0x0308 > + /* This only affects TGSI (OpenGL), so it's okay to set it for > +* compute shaders too. > +*/ > + LLVMAddTargetDependentFunctionAttr(F, "unsafe-fp-math", "true"); > +#endif > } > > static void init_r600_target() > -- > 2.1.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/10] gallivm: supply correct opcode info to emit functions
Am 11.10.2015 um 03:29 schrieb Marek Olšák: > From: Marek Olšák> > This is useful only when emit functions use it. > The new radeonsi min/max opcode implementation requires this. > --- > src/gallium/auxiliary/gallivm/lp_bld_tgsi.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c > b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c > index c4ae304..c50d83e 100644 > --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c > +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c > @@ -114,12 +114,17 @@ lp_build_emit_llvm( > struct lp_build_emit_data * emit_data) > { > struct lp_build_tgsi_action * action = _base->op_actions[tgsi_opcode]; > + const struct tgsi_opcode_info *old_info = emit_data->info; > /* XXX: Assert that this is a componentwise or replicate instruction */ > > lp_build_action_set_dst_type(emit_data, bld_base, tgsi_opcode); > emit_data->chan = 0; > + > + /* Set and restore the opcode info. */ > + emit_data->info = tgsi_get_opcode_info(tgsi_opcode); > assert(action->emit); > action->emit(action, bld_base, emit_data); > + emit_data->info = old_info; > return emit_data->output[0]; > } > > Could you elaborate why this is necessary? Looks like a hack and I can't see why opcode info would be wrong in the first place. Or if that's never set correctly and you just need it to be able to distinguish min/max later, I'd suggest you shouldn't do that and just use different functions. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] radeonsi: support thread-safe shaders shared by multiple contexts
From: Marek OlšákThe "current" shader pointer is moved from the CSO to the context, so that the CSO is mostly immutable. The only drawback is that the "current" pointer isn't saved when unbinding a shader and it must be looked up when the shader is bound again. This is also a prerequisite for multithreaded shader compilation. --- src/gallium/drivers/radeonsi/si_blit.c | 10 +- src/gallium/drivers/radeonsi/si_debug.c | 18 +- src/gallium/drivers/radeonsi/si_descriptors.c | 12 +- src/gallium/drivers/radeonsi/si_pipe.c | 6 +- src/gallium/drivers/radeonsi/si_pipe.h | 21 +- src/gallium/drivers/radeonsi/si_shader.h| 31 +-- src/gallium/drivers/radeonsi/si_state.c | 2 +- src/gallium/drivers/radeonsi/si_state_draw.c| 44 ++-- src/gallium/drivers/radeonsi/si_state_shaders.c | 279 +--- 9 files changed, 224 insertions(+), 199 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_blit.c b/src/gallium/drivers/radeonsi/si_blit.c index d5c5db3..082ea85 100644 --- a/src/gallium/drivers/radeonsi/si_blit.c +++ b/src/gallium/drivers/radeonsi/si_blit.c @@ -55,11 +55,11 @@ static void si_blitter_begin(struct pipe_context *ctx, enum si_blitter_op op) util_blitter_save_depth_stencil_alpha(sctx->blitter, sctx->queued.named.dsa); util_blitter_save_stencil_ref(sctx->blitter, >stencil_ref.state); util_blitter_save_rasterizer(sctx->blitter, sctx->queued.named.rasterizer); - util_blitter_save_fragment_shader(sctx->blitter, sctx->ps_shader); - util_blitter_save_geometry_shader(sctx->blitter, sctx->gs_shader); - util_blitter_save_tessctrl_shader(sctx->blitter, sctx->tcs_shader); - util_blitter_save_tesseval_shader(sctx->blitter, sctx->tes_shader); - util_blitter_save_vertex_shader(sctx->blitter, sctx->vs_shader); + util_blitter_save_fragment_shader(sctx->blitter, sctx->ps_shader.cso); + util_blitter_save_geometry_shader(sctx->blitter, sctx->gs_shader.cso); + util_blitter_save_tessctrl_shader(sctx->blitter, sctx->tcs_shader.cso); + util_blitter_save_tesseval_shader(sctx->blitter, sctx->tes_shader.cso); + util_blitter_save_vertex_shader(sctx->blitter, sctx->vs_shader.cso); util_blitter_save_vertex_elements(sctx->blitter, sctx->vertex_elements); util_blitter_save_sample_mask(sctx->blitter, sctx->sample_mask.sample_mask); util_blitter_save_viewport(sctx->blitter, >viewports.states[0]); diff --git a/src/gallium/drivers/radeonsi/si_debug.c b/src/gallium/drivers/radeonsi/si_debug.c index 7d41e8d..5306218 100644 --- a/src/gallium/drivers/radeonsi/si_debug.c +++ b/src/gallium/drivers/radeonsi/si_debug.c @@ -31,15 +31,15 @@ #include "ddebug/dd_util.h" -static void si_dump_shader(struct si_shader_selector *sel, const char *name, +static void si_dump_shader(struct si_shader_ctx_state *state, const char *name, FILE *f) { - if (!sel || !sel->current) + if (!state->cso || !state->current) return; fprintf(f, "%s shader disassembly:\n", name); - si_dump_shader_key(sel->type, >current->key, f); - fprintf(f, "%s\n\n", sel->current->binary.disasm_string); + si_dump_shader_key(state->cso->type, >current->key, f); + fprintf(f, "%s\n\n", state->current->binary.disasm_string); } /* Parsed IBs are difficult to read without colors. Use "less -R file" to @@ -536,11 +536,11 @@ static void si_dump_debug_state(struct pipe_context *ctx, FILE *f, if (flags & PIPE_DEBUG_DEVICE_IS_HUNG) si_dump_debug_registers(sctx, f); - si_dump_shader(sctx->vs_shader, "Vertex", f); - si_dump_shader(sctx->tcs_shader, "Tessellation control", f); - si_dump_shader(sctx->tes_shader, "Tessellation evaluation", f); - si_dump_shader(sctx->gs_shader, "Geometry", f); - si_dump_shader(sctx->ps_shader, "Fragment", f); + si_dump_shader(>vs_shader, "Vertex", f); + si_dump_shader(>tcs_shader, "Tessellation control", f); + si_dump_shader(>tes_shader, "Tessellation evaluation", f); + si_dump_shader(>gs_shader, "Geometry", f); + si_dump_shader(>ps_shader, "Fragment", f); si_dump_last_bo_list(sctx, f); si_dump_last_ib(sctx, f); diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c b/src/gallium/drivers/radeonsi/si_descriptors.c index 19dd14f..13738da 100644 --- a/src/gallium/drivers/radeonsi/si_descriptors.c +++ b/src/gallium/drivers/radeonsi/si_descriptors.c @@ -915,10 +915,10 @@ static void si_set_user_data_base(struct si_context *sctx, void si_shader_change_notify(struct si_context *sctx) { /* VS can be bound as VS, ES, or LS. */ - if (sctx->tes_shader) + if (sctx->tes_shader.cso) si_set_user_data_base(sctx, PIPE_SHADER_VERTEX, R_00B530_SPI_SHADER_USER_DATA_LS_0); - else if
[Mesa-dev] [PATCH 2/3] radeonsi: implement vertex color clamping
From: Marek OlšákThis is only supported in the compatibility profile (without GS and tess). --- src/gallium/drivers/radeonsi/si_pipe.c | 2 +- src/gallium/drivers/radeonsi/si_shader.c| 42 + src/gallium/drivers/radeonsi/si_shader.h| 8 +++-- src/gallium/drivers/radeonsi/si_state.c | 2 ++ src/gallium/drivers/radeonsi/si_state_shaders.c | 2 +- 5 files changed, 52 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index 894fc59..d4be6f9 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++ b/src/gallium/drivers/radeonsi/si_pipe.c @@ -271,6 +271,7 @@ static int si_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_START_INSTANCE: case PIPE_CAP_NPOT_TEXTURES: case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES: + case PIPE_CAP_VERTEX_COLOR_CLAMPED: case PIPE_CAP_FRAGMENT_COLOR_CLAMPED: case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER: case PIPE_CAP_TGSI_INSTANCEID: @@ -331,7 +332,6 @@ static int si_get_param(struct pipe_screen* pscreen, enum pipe_cap param) /* Unsupported features. */ case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT: case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS: - case PIPE_CAP_VERTEX_COLOR_CLAMPED: case PIPE_CAP_USER_VERTEX_BUFFERS: case PIPE_CAP_FAKE_SW_MSAA: case PIPE_CAP_TEXTURE_GATHER_OFFSETS: diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 1f9b2b6..8da2f77 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -2075,6 +2075,45 @@ static void si_llvm_emit_vs_epilogue(struct lp_build_tgsi_context * bld_base) outputs = MALLOC((info->num_outputs + 1) * sizeof(outputs[0])); + /* Vertex color clamping. +* +* This uses a state constant loaded in a user data SGPR and +* an IF statement is added that clamps all colors if the constant +* is true. +*/ + if (si_shader_ctx->type == TGSI_PROCESSOR_VERTEX && + !si_shader_ctx->shader->is_gs_copy_shader) { + struct lp_build_if_state if_ctx; + LLVMValueRef cond = NULL; + LLVMValueRef addr, val; + + for (i = 0; i < info->num_outputs; i++) { + if (info->output_semantic_name[i] != TGSI_SEMANTIC_COLOR && + info->output_semantic_name[i] != TGSI_SEMANTIC_BCOLOR) + continue; + + /* We've found a color. */ + if (!cond) { + /* The state is in the first bit of the user SGPR. */ + cond = LLVMGetParam(si_shader_ctx->radeon_bld.main_fn, + SI_PARAM_VS_STATE_BITS); + cond = LLVMBuildTrunc(gallivm->builder, cond, + LLVMInt1TypeInContext(gallivm->context), ""); + lp_build_if(_ctx, gallivm, cond); + } + + for (j = 0; j < 4; j++) { + addr = si_shader_ctx->radeon_bld.soa.outputs[i][j]; + val = LLVMBuildLoad(gallivm->builder, addr, ""); + val = radeon_llvm_saturate(bld_base, val); + LLVMBuildStore(gallivm->builder, val, addr); + } + } + + if (cond) + lp_build_endif(_ctx); + } + for (i = 0; i < info->num_outputs; i++) { outputs[i].name = info->output_semantic_name[i]; outputs[i].sid = info->output_semantic_index[i]; @@ -3444,6 +3483,9 @@ static void create_function(struct si_shader_context *si_shader_ctx) if (shader->is_gs_copy_shader) { last_array_pointer = SI_PARAM_CONST; num_params = SI_PARAM_CONST+1; + } else { + params[SI_PARAM_VS_STATE_BITS] = i32; + num_params = SI_PARAM_VS_STATE_BITS+1; } /* The locations of the other parameters are assigned dynamically. */ diff --git a/src/gallium/drivers/radeonsi/si_shader.h b/src/gallium/drivers/radeonsi/si_shader.h index fa5930a..54dad72 100644 --- a/src/gallium/drivers/radeonsi/si_shader.h +++ b/src/gallium/drivers/radeonsi/si_shader.h @@ -83,6 +83,7 @@ struct radeon_shader_reloc; #define SI_SGPR_VERTEX_BUFFER 8 /* VS only */ #define SI_SGPR_BASE_VERTEX10 /* VS only */ #define SI_SGPR_START_INSTANCE 11 /* VS only */ +#define SI_SGPR_VS_STATE_BITS 12 /* VS(VS) only */ #define
[Mesa-dev] [PATCH 0/3] RadeonSI forcing st/mesa to create shaders at link time
Hi, This patch series implements all features needed for st/mesa to send shaders to the driver immediately. The good thing about thread-safe shader CSOs is that multithreaded shader compilation suddenly seems easy. Please review. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] radeonsi: implement fragment color clamping
From: Marek Olšákusing the shader key for now. --- src/gallium/drivers/radeonsi/si_pipe.c | 2 +- src/gallium/drivers/radeonsi/si_shader.c| 13 + src/gallium/drivers/radeonsi/si_shader.h| 1 + src/gallium/drivers/radeonsi/si_state.c | 2 +- src/gallium/drivers/radeonsi/si_state.h | 1 + src/gallium/drivers/radeonsi/si_state_shaders.c | 1 + 6 files changed, 18 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index aa5a9ea..894fc59 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++ b/src/gallium/drivers/radeonsi/si_pipe.c @@ -271,6 +271,7 @@ static int si_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_START_INSTANCE: case PIPE_CAP_NPOT_TEXTURES: case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES: + case PIPE_CAP_FRAGMENT_COLOR_CLAMPED: case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER: case PIPE_CAP_TGSI_INSTANCEID: case PIPE_CAP_COMPUTE: @@ -330,7 +331,6 @@ static int si_get_param(struct pipe_screen* pscreen, enum pipe_cap param) /* Unsupported features. */ case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT: case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS: - case PIPE_CAP_FRAGMENT_COLOR_CLAMPED: case PIPE_CAP_VERTEX_COLOR_CLAMPED: case PIPE_CAP_USER_VERTEX_BUFFERS: case PIPE_CAP_FAKE_SW_MSAA: diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 012d708..1f9b2b6 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -2109,6 +2109,7 @@ static void si_llvm_emit_fs_epilogue(struct lp_build_tgsi_context * bld_base) struct lp_build_context * base = _base->base; struct lp_build_context * uint = _base->uint_bld; struct tgsi_shader_info *info = >selector->info; + LLVMBuilderRef builder = base->gallivm->builder; LLVMValueRef args[9]; LLVMValueRef last_args[9] = { 0 }; int depth_index = -1, stencil_index = -1, samplemask_index = -1; @@ -2135,6 +2136,16 @@ static void si_llvm_emit_fs_epilogue(struct lp_build_tgsi_context * bld_base) target = V_008DFC_SQ_EXP_MRT + semantic_index; alpha_ptr = si_shader_ctx->radeon_bld.soa.outputs[i][3]; + if (si_shader_ctx->shader->key.ps.clamp_color) { + for (int j = 0; j < 4; j++) { + LLVMValueRef ptr = si_shader_ctx->radeon_bld.soa.outputs[i][j]; + LLVMValueRef result = LLVMBuildLoad(builder, ptr, ""); + + result = radeon_llvm_saturate(bld_base, result); + LLVMBuildStore(builder, result, ptr); + } + } + if (si_shader_ctx->shader->key.ps.alpha_to_one) LLVMBuildStore(base->gallivm->builder, base->one, alpha_ptr); @@ -2145,6 +2156,7 @@ static void si_llvm_emit_fs_epilogue(struct lp_build_tgsi_context * bld_base) if (si_shader_ctx->shader->key.ps.poly_line_smoothing) si_scale_alpha_by_sample_mask(bld_base, alpha_ptr); + break; default: target = 0; @@ -3999,6 +4011,7 @@ void si_dump_shader_key(unsigned shader, union si_shader_key *key, FILE *f) fprintf(f, " alpha_func = %u\n", key->ps.alpha_func); fprintf(f, " alpha_to_one = %u\n", key->ps.alpha_to_one); fprintf(f, " poly_stipple = %u\n", key->ps.poly_stipple); + fprintf(f, " clamp_color = %u\n", key->ps.clamp_color); break; default: diff --git a/src/gallium/drivers/radeonsi/si_shader.h b/src/gallium/drivers/radeonsi/si_shader.h index 460..fa5930a 100644 --- a/src/gallium/drivers/radeonsi/si_shader.h +++ b/src/gallium/drivers/radeonsi/si_shader.h @@ -227,6 +227,7 @@ union si_shader_key { unsignedalpha_to_one:1; unsignedpoly_stipple:1; unsignedpoly_line_smoothing:1; + unsignedclamp_color:1; } ps; struct { unsignedinstance_divisors[SI_NUM_VERTEX_BUFFERS]; diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index 00d4bc1..3aafe8a 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -694,7 +694,7 @@ static void *si_create_rs_state(struct pipe_context *ctx, rs->poly_smooth = state->poly_smooth; rs->uses_poly_offset = state->offset_point || state->offset_line
Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization
On 10/10/15 16:54, Jason Ekstrand wrote: > On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro> wrote: >> vec4 port of fs_cmod_propagation. >> >> Shader-db results: >> total instructions in shared programs: 6241226 -> 6224469 (-0.27%) >> instructions in affected programs: 498213 -> 481456 (-3.36%) >> helped:3082 >> HURT: 0 > Would you mind cherry-picking this back onto > 4e0a8e0a50c9ac91cb7a70b92b8d9c6fcc02b7aa (the commit right before we > made NIR non-optional) and get some GLSL IR vs. NIR vec4-only numbers > with this patch? I'd like to know what it does to that delta as well. FWIW, the previous shader-db numbers were done without grepping for vec4. Matt mentioned that he preferred that way. As asked, the numbers for this email will be vec4-only numbers (so grepping for vec4). So, the shader-db numbers of IR vs NIR at that reference commit are the following: total instructions in shared programs: 1848027 -> 1648216 (-10.81%) instructions in affected programs: 1660279 -> 1460468 (-12.03%) helped:14668 HURT: 1369 And IR vs NIR numbers cherry-picking the optimization are the following: total instructions in shared programs: 1845902 -> 1631459 (-11.62%) instructions in affected programs: 1663398 -> 1448955 (-12.89%) helped:14863 HURT: 1203 FWIW, we need to take into account that this commit is also helping IR. The shader-db numbers of IR reference vs IR cherry picking are the following: total instructions in shared programs: 1848027 -> 1845902 (-0.11%) instructions in affected programs: 195042 -> 192917 (-1.09%) helped:1033 HURT: 0 And for that reason, probably it is worth to compare IR at the reference versus NIR results cherry-picking, that are the following: total instructions in shared programs: 1848027 -> 1631459 (-11.72%) instructions in affected programs: 1672237 -> 1455669 (-12.95%) helped:14955 HURT: 1194 > > Thanks! > --Jason You are welcome. Thanks for the patch reviewing. Best regards > >> --- >> >> The final outcome is really similar to fs_brw_cmod_propagation. In >> fact the only difference is that on fs we have this: >> if (scan_inst->overwrites_reg(inst->src[0])) { >> if (scan_inst->is_partial_write() || >> scan_inst->dst.reg_offset != inst->src[0].reg_offset) >>break; >> >> And on vec4 (this commit) we have this: >> if (inst->src[0].in_range(scan_inst->dst, >>scan_inst->regs_written)) { >> >> if ((scan_inst->predicate && scan_inst->opcode != >> BRW_OPCODE_SEL) || >> scan_inst->dst.reg_offset != inst->src[0].reg_offset || >> (scan_inst->dst.writemask != WRITEMASK_X && >> scan_inst->dst.writemask != WRITEMASK_XYZW)) >>break; >> >> if (scan_inst->dst.writemask == WRITEMASK_XYZW && >> inst->src[0].swizzle != BRW_SWIZZLE_XYZW) { >>break; >> } >> >> So at some point I thought about refactoring it and having one common, >> like with opt_predicated_break, but that one was possible with just >> backend_instructions, while here we would need to deal with >> vec4_instructions and fs_inst, that could be somewhat messy, so >> I'm leaving this as it is. >> >> src/mesa/drivers/dri/i965/Makefile.sources | 1 + >> src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + >> src/mesa/drivers/dri/i965/brw_vec4.h | 1 + >> .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 >> + >> 4 files changed, 166 insertions(+) >> create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp >> >> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources >> b/src/mesa/drivers/dri/i965/Makefile.sources >> index 81ef628..c1836d6 100644 >> --- a/src/mesa/drivers/dri/i965/Makefile.sources >> +++ b/src/mesa/drivers/dri/i965/Makefile.sources >> @@ -56,6 +56,7 @@ i965_compiler_FILES = \ >> brw_util.c \ >> brw_util.h \ >> brw_vec4_builder.h \ >> + brw_vec4_cmod_propagation.cpp \ >> brw_vec4_copy_propagation.cpp \ >> brw_vec4.cpp \ >> brw_vec4_cse.cpp \ >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> index e966b96..55e381b 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> @@ -1867,6 +1867,7 @@ vec4_visitor::run() >>OPT(dead_code_eliminate); >>OPT(dead_control_flow_eliminate, this); >>OPT(opt_copy_propagation); >> + OPT(opt_cmod_propagation); >>
[Mesa-dev] [PATCH shader-db 1/3] Makefile: avoid undefined reference build errors with LIBS
Signed-off-by: Rhys Kidd--- .gitignore | 1 + Makefile | 14 +++--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/.gitignore b/.gitignore index f69750a..cffa19c 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,3 @@ bin run +*.o diff --git a/Makefile b/Makefile index 1ae0776..a4bfb8f 100644 --- a/Makefile +++ b/Makefile @@ -21,9 +21,17 @@ CFLAGS ?= -g -O2 -march=native -pipe CFLAGS += -std=gnu99 -fopenmp -LDFLAGS = -lepoxy -lgbm +LIBS = -lepoxy -lgbm -run: +OBJ = run.o + +all: run + +%.o: %.c + $(CC) -c -o $@ $< $(CFLAGS) + +run: $(OBJ) + $(CC) $(CFLAGS) -o $@ $^ $(LIBS) clean: - rm -f run + rm -f run $(OBJ) -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH shader-db 0/3] Makefile and documentation cleanup
Patchset adds Makefile and documentation improvements. I aimed to write these as I would have found most helpful when seeking to understand shader-db's operation, as a new Mesa developer. First patch resolves the build errors [0] experienced on Ubuntu 15.04 and permit a simple 'make' to work if the dependencies are met. The following two patches improve the documentation of those dependencies. [0] $ cc --version cc (Ubuntu 4.9.2-10ubuntu13) 4.9.2 ... $ make cc -g -O2 -march=native -pipe -std=gnu99 -fopenmp -lepoxy -lgbm run.c -o run /tmp/ccaZrtAC.o: In function `main._omp_fn.0': /home/usera/Coding/shader-db/run.c:511: undefined reference to `epoxy_eglBindAPI' /home/usera/Coding/shader-db/run.c:513: undefined reference to `epoxy_eglCreateContext' /home/usera/Coding/shader-db/run.c:516: undefined reference to `epoxy_eglMakeCurrent' /home/usera/Coding/shader-db/run.c:528: undefined reference to `epoxy_eglCreateContext' /home/usera/Coding/shader-db/run.c:536: undefined reference to `epoxy_eglMakeCurrent' /home/usera/Coding/shader-db/run.c:541: undefined reference to `epoxy_glEnable' /home/usera/Coding/shader-db/run.c:542: undefined reference to `epoxy_glEnable' /home/usera/Coding/shader-db/run.c:543: undefined reference to `epoxy_glDebugMessageControl' /home/usera/Coding/shader-db/run.c:545: undefined reference to `epoxy_glDebugMessageControl' /home/usera/Coding/shader-db/run.c:548: undefined reference to `epoxy_glDebugMessageCallback' /home/usera/Coding/shader-db/run.c:642: undefined reference to `epoxy_eglDestroyContext' /home/usera/Coding/shader-db/run.c:643: undefined reference to `epoxy_eglDestroyContext' /home/usera/Coding/shader-db/run.c:644: undefined reference to `epoxy_eglReleaseThread' /home/usera/Coding/shader-db/run.c:585: undefined reference to `epoxy_eglMakeCurrent' /home/usera/Coding/shader-db/run.c:620: undefined reference to `epoxy_glGenProgramsARB' /home/usera/Coding/shader-db/run.c:621: undefined reference to `epoxy_glBindProgramARB' /home/usera/Coding/shader-db/run.c:622: undefined reference to `epoxy_glProgramStringARB' /home/usera/Coding/shader-db/run.c:624: undefined reference to `epoxy_glDeleteProgramsARB' /home/usera/Coding/shader-db/run.c:625: undefined reference to `epoxy_glGetError' /home/usera/Coding/shader-db/run.c:594: undefined reference to `epoxy_glCreateProgram' /home/usera/Coding/shader-db/run.c:611: undefined reference to `epoxy_glAttachShader' /home/usera/Coding/shader-db/run.c:612: undefined reference to `epoxy_glDeleteShader' /home/usera/Coding/shader-db/run.c:597: undefined reference to `epoxy_glCreateShader' /home/usera/Coding/shader-db/run.c:598: undefined reference to `epoxy_glShaderSource' /home/usera/Coding/shader-db/run.c:599: undefined reference to `epoxy_glCompileShader' /home/usera/Coding/shader-db/run.c:602: undefined reference to `epoxy_glGetShaderiv' /home/usera/Coding/shader-db/run.c:606: undefined reference to `epoxy_glGetShaderInfoLog' /home/usera/Coding/shader-db/run.c:615: undefined reference to `epoxy_glLinkProgram' /home/usera/Coding/shader-db/run.c:616: undefined reference to `epoxy_glDeleteProgram' /home/usera/Coding/shader-db/run.c:517: undefined reference to `epoxy_glEnable' /home/usera/Coding/shader-db/run.c:518: undefined reference to `epoxy_glEnable' /home/usera/Coding/shader-db/run.c:519: undefined reference to `epoxy_glDebugMessageControl' /home/usera/Coding/shader-db/run.c:521: undefined reference to `epoxy_glDebugMessageControl' /home/usera/Coding/shader-db/run.c:525: undefined reference to `epoxy_glDebugMessageCallback' /tmp/ccaZrtAC.o: In function `main': /home/usera/Coding/shader-db/run.c:334: undefined reference to `epoxy_eglQueryString' /home/usera/Coding/shader-db/run.c:354: undefined reference to `gbm_create_device' /home/usera/Coding/shader-db/run.c:361: undefined reference to `epoxy_eglGetPlatformDisplayEXT' /home/usera/Coding/shader-db/run.c:369: undefined reference to `epoxy_eglInitialize' /home/usera/Coding/shader-db/run.c:379: undefined reference to `epoxy_eglQueryString' /home/usera/Coding/shader-db/run.c:395: undefined reference to `epoxy_eglChooseConfig' /home/usera/Coding/shader-db/run.c:659: undefined reference to `epoxy_eglTerminate' /home/usera/Coding/shader-db/run.c:661: undefined reference to `gbm_device_destroy' /home/usera/Coding/shader-db/run.c:401: undefined reference to `epoxy_eglBindAPI' /home/usera/Coding/shader-db/run.c:412: undefined reference to `epoxy_eglCreateContext' /home/usera/Coding/shader-db/run.c:415: undefined reference to `epoxy_eglMakeCurrent' /home/usera/Coding/shader-db/run.c:462: undefined reference to `epoxy_eglCreateContext' /home/usera/Coding/shader-db/run.c:470: undefined reference to `epoxy_eglMakeCurrent' /home/usera/Coding/shader-db/run.c:475: undefined reference to `epoxy_glGetString' /home/usera/Coding/shader-db/run.c:478: undefined reference to `epoxy_glGetString' /home/usera/Coding/shader-db/run.c:417: undefined reference to
[Mesa-dev] [PATCH shader-db 2/3] docs: Improve dependencies documentation
Signed-off-by: Rhys Kidd--- README | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/README b/README index e301d0e..6ed3244 100644 --- a/README +++ b/README @@ -47,7 +47,18 @@ ST_DEBUG=precompile R600_DEBUG=ps,vs,gs,precompile ./run shaders -1 2> new-run === Dependencies === run requires some GNU C extensions, render nodes (/dev/dri/renderD128), -libepoxy, OpenMP, and Mesa configured with --with-egl-platforms=x11,drm +libepoxy, libgbm, OpenMP, and Mesa configured with --with-egl-platforms=x11,drm + +Install necessary dependencies on Ubuntu: +``` +# Developers will probably have a local build of Mesa +sudo apt-get install build-essentials libepoxy-dev libgbm-dev +``` + +Build with: +``` +make +``` === jemalloc === Since run compiles shaders in different threads, malloc/free locking overhead -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH shader-db 3/3] docs: Add symbolic link generation step
Signed-off-by: Rhys Kidd--- README | 5 + 1 file changed, 5 insertions(+) diff --git a/README b/README index 6ed3244..03be4e7 100644 --- a/README +++ b/README @@ -60,6 +60,11 @@ Build with: make ``` +run.py relies on a symbolic link to a built piglit bin directory, as follows: +``` +ln -s /bin "$PWD"/bin +``` + === jemalloc === Since run compiles shaders in different threads, malloc/free locking overhead from inside Mesa can be expensive. Preloading jemalloc can cut significant -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space
On 10/10/2015 09:58 PM, Ilia Mirkin wrote: On Sat, Oct 10, 2015 at 3:55 PM, Samuel Pitoisetwrote: On 10/10/2015 09:42 PM, Ilia Mirkin wrote: On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset wrote: This patch looks fine except that it should be a bit more normalized. I mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for PUSH_SPACE calls, sometimes you add it sometimes not. Meh. We need to get our error checking situation straight, but this isn't the patch to do it in. Yeah, but this needs to be clarified. What does? I mean, we should either use PUSH_SPACE everywhere or not at all, and always breaks (or not) when PUSH_SPACE fails. That's really a minor issue. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] gallium: add PIPE_CAP_SHAREABLE_SHADERS
From: Marek OlšákI'll let drivers figure out how to do it. --- src/gallium/docs/source/screen.rst | 2 ++ src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + 15 files changed, 16 insertions(+) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index e08844b..72f7596 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -276,6 +276,8 @@ The integer capabilities: GL4 hardware will likely need to emulate it with a shader variant, or by selecting the interpolation weights with a conditional assignment in the shader. +* ``PIPE_CAP_SHAREABLE_SHADERS``: Whether shader CSOs can be used by any + pipe_context. diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 0d01005..2e8bf47 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -236,6 +236,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_DEPTH_BOUNDS_TEST: case PIPE_CAP_TGSI_TXQS: case PIPE_CAP_FORCE_PERSAMPLE_INTERP: + case PIPE_CAP_SHAREABLE_SHADERS: return 0; case PIPE_CAP_MAX_VIEWPORTS: diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 9d6b3d3..c91408d 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -249,6 +249,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) case PIPE_CAP_DEPTH_BOUNDS_TEST: case PIPE_CAP_TGSI_TXQS: case PIPE_CAP_FORCE_PERSAMPLE_INTERP: + case PIPE_CAP_SHAREABLE_SHADERS: return 0; case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS: diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c index 76812a6..acf688f 100644 --- a/src/gallium/drivers/ilo/ilo_screen.c +++ b/src/gallium/drivers/ilo/ilo_screen.c @@ -471,6 +471,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_DEPTH_BOUNDS_TEST: case PIPE_CAP_TGSI_TXQS: case PIPE_CAP_FORCE_PERSAMPLE_INTERP: + case PIPE_CAP_SHAREABLE_SHADERS: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index 50c3781..e2ed267 100644 --- a/src/gallium/drivers/llvmpipe/lp_screen.c +++ b/src/gallium/drivers/llvmpipe/lp_screen.c @@ -298,6 +298,7 @@ llvmpipe_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_DEPTH_BOUNDS_TEST: case PIPE_CAP_TGSI_TXQS: case PIPE_CAP_FORCE_PERSAMPLE_INTERP: + case PIPE_CAP_SHAREABLE_SHADERS: return 0; } /* should only get here on unhandled cases */ diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c b/src/gallium/drivers/nouveau/nv30/nv30_screen.c index 335c163..d4cf143 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c @@ -171,6 +171,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR: case PIPE_CAP_TGSI_TXQS: case PIPE_CAP_FORCE_PERSAMPLE_INTERP: + case PIPE_CAP_SHAREABLE_SHADERS: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 812b246..a4431f2 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -216,6 +216,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS: case PIPE_CAP_FORCE_PERSAMPLE_INTERP: + case PIPE_CAP_SHAREABLE_SHADERS: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index afd91e6..57c9c6c 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -202,6 +202,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
[Mesa-dev] [PATCH 3/3] st/mesa: create shaders which have only one variant immediatelly
From: Marek Olšák--- src/mesa/state_tracker/st_cb_program.c | 5 +++-- src/mesa/state_tracker/st_context.c| 14 ++ src/mesa/state_tracker/st_context.h| 7 +++ 3 files changed, 24 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_program.c b/src/mesa/state_tracker/st_cb_program.c index 40f2af0..611aea7 100644 --- a/src/mesa/state_tracker/st_cb_program.c +++ b/src/mesa/state_tracker/st_cb_program.c @@ -222,6 +222,7 @@ st_program_string_notify( struct gl_context *ctx, struct gl_program *prog ) { struct st_context *st = st_context(ctx); + gl_shader_stage stage = _mesa_program_enum_to_shader_stage(target); if (target == GL_FRAGMENT_PROGRAM_ARB) { struct st_fragment_program *stfp = (struct st_fragment_program *) prog; @@ -276,10 +277,10 @@ st_program_string_notify( struct gl_context *ctx, st->dirty.st |= ST_NEW_TESSEVAL_PROGRAM; } - if (ST_DEBUG & DEBUG_PRECOMPILE) + if (ST_DEBUG & DEBUG_PRECOMPILE || + st->shader_has_one_variant[stage]) st_precompile_shader_variant(st, prog); - /* XXX check if program is legal, within limits */ return GL_TRUE; } diff --git a/src/mesa/state_tracker/st_context.c b/src/mesa/state_tracker/st_context.c index 6256c0b..4f3f525 100644 --- a/src/mesa/state_tracker/st_context.c +++ b/src/mesa/state_tracker/st_context.c @@ -293,6 +293,20 @@ st_create_context_priv( struct gl_context *ctx, struct pipe_context *pipe, ctx->Const.ShaderCompilerOptions[i].EmitNoIndirectSampler = true; } + /* Set which shader types can be compiled at link time. */ + st->shader_has_one_variant[MESA_SHADER_VERTEX] = + st->has_shareable_shaders && + !st->clamp_vert_color_in_shader; + + st->shader_has_one_variant[MESA_SHADER_FRAGMENT] = + st->has_shareable_shaders && + !st->clamp_frag_color_in_shader && + st->can_force_persample_interp; + + st->shader_has_one_variant[MESA_SHADER_TESS_CTRL] = st->has_shareable_shaders; + st->shader_has_one_variant[MESA_SHADER_TESS_EVAL] = st->has_shareable_shaders; + st->shader_has_one_variant[MESA_SHADER_GEOMETRY] = st->has_shareable_shaders; + _mesa_compute_version(ctx); if (ctx->Version == 0) { diff --git a/src/mesa/state_tracker/st_context.h b/src/mesa/state_tracker/st_context.h index 446fe5d..d0aed7e 100644 --- a/src/mesa/state_tracker/st_context.h +++ b/src/mesa/state_tracker/st_context.h @@ -101,6 +101,13 @@ struct st_context boolean can_force_persample_interp; boolean has_shareable_shaders; + /** +* If a shader can be created when we get its source. +* This means it has only 1 variant, not counting glBitmap and +* glDrawPixels. +*/ + boolean shader_has_one_variant[MESA_SHADER_STAGES]; + boolean needs_texcoord_semantic; boolean apply_texture_swizzle_to_border_color; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] st/mesa: decouple shaders from contexts if they are shareable
From: Marek Olšák--- src/mesa/state_tracker/st_atom_shader.c | 10 +- src/mesa/state_tracker/st_cb_bitmap.c | 2 +- src/mesa/state_tracker/st_cb_drawpixels.c | 2 +- src/mesa/state_tracker/st_context.c | 3 ++- src/mesa/state_tracker/st_context.h | 1 + src/mesa/state_tracker/st_program.c | 16 +++- 6 files changed, 21 insertions(+), 13 deletions(-) diff --git a/src/mesa/state_tracker/st_atom_shader.c b/src/mesa/state_tracker/st_atom_shader.c index 1e880a1..3941454 100644 --- a/src/mesa/state_tracker/st_atom_shader.c +++ b/src/mesa/state_tracker/st_atom_shader.c @@ -64,7 +64,7 @@ update_fp( struct st_context *st ) assert(stfp->Base.Base.Target == GL_FRAGMENT_PROGRAM_ARB); memset(, 0, sizeof(key)); - key.st = st; + key.st = st->has_shareable_shaders ? NULL : st; /* _NEW_FRAG_CLAMP */ key.clamp_color = st->clamp_frag_color_in_shader && @@ -119,7 +119,7 @@ update_vp( struct st_context *st ) assert(stvp->Base.Base.Target == GL_VERTEX_PROGRAM_ARB); memset(, 0, sizeof key); - key.st = st; /* variants are per-context */ + key.st = st->has_shareable_shaders ? NULL : st; /* When this is true, we will add an extra input to the vertex * shader translation (for edgeflags), an extra output with @@ -174,7 +174,7 @@ update_gp( struct st_context *st ) assert(stgp->Base.Base.Target == GL_GEOMETRY_PROGRAM_NV); memset(, 0, sizeof(key)); - key.st = st; + key.st = st->has_shareable_shaders ? NULL : st; st->gp_variant = st_get_gp_variant(st, stgp, ); @@ -210,7 +210,7 @@ update_tcp( struct st_context *st ) assert(sttcp->Base.Base.Target == GL_TESS_CONTROL_PROGRAM_NV); memset(, 0, sizeof(key)); - key.st = st; + key.st = st->has_shareable_shaders ? NULL : st; st->tcp_variant = st_get_tcp_variant(st, sttcp, ); @@ -246,7 +246,7 @@ update_tep( struct st_context *st ) assert(sttep->Base.Base.Target == GL_TESS_EVALUATION_PROGRAM_NV); memset(, 0, sizeof(key)); - key.st = st; + key.st = st->has_shareable_shaders ? NULL : st; st->tep_variant = st_get_tep_variant(st, sttep, ); diff --git a/src/mesa/state_tracker/st_cb_bitmap.c b/src/mesa/state_tracker/st_cb_bitmap.c index bb6dfe8..cbc6845 100644 --- a/src/mesa/state_tracker/st_cb_bitmap.c +++ b/src/mesa/state_tracker/st_cb_bitmap.c @@ -269,7 +269,7 @@ draw_bitmap_quad(struct gl_context *ctx, GLint x, GLint y, GLfloat z, struct pipe_resource *vbuf = NULL; memset(, 0, sizeof(key)); - key.st = st; + key.st = st->has_shareable_shaders ? NULL : st; key.bitmap = GL_TRUE; key.clamp_color = st->clamp_frag_color_in_shader && st->ctx->Color._ClampFragmentColor; diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index 7e8633e..20cbfde 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -914,7 +914,7 @@ get_color_fp_variant(struct st_context *st) memset(, 0, sizeof(key)); - key.st = st; + key.st = st->has_shareable_shaders ? NULL : st; key.drawpixels = 1; key.scaleAndBias = (ctx->Pixel.RedBias != 0.0 || ctx->Pixel.RedScale != 1.0 || diff --git a/src/mesa/state_tracker/st_context.c b/src/mesa/state_tracker/st_context.c index bef7307..6256c0b 100644 --- a/src/mesa/state_tracker/st_context.c +++ b/src/mesa/state_tracker/st_context.c @@ -237,7 +237,8 @@ st_create_context_priv( struct gl_context *ctx, struct pipe_context *pipe, PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER); st->can_force_persample_interp = screen->get_param(screen, PIPE_CAP_FORCE_PERSAMPLE_INTERP); - + st->has_shareable_shaders = screen->get_param(screen, + PIPE_CAP_SHAREABLE_SHADERS); st->needs_texcoord_semantic = screen->get_param(screen, PIPE_CAP_TGSI_TEXCOORD); st->apply_texture_swizzle_to_border_color = diff --git a/src/mesa/state_tracker/st_context.h b/src/mesa/state_tracker/st_context.h index f187d82..446fe5d 100644 --- a/src/mesa/state_tracker/st_context.h +++ b/src/mesa/state_tracker/st_context.h @@ -99,6 +99,7 @@ struct st_context boolean has_etc2; boolean prefer_blit_based_texture_transfer; boolean can_force_persample_interp; + boolean has_shareable_shaders; boolean needs_texcoord_semantic; boolean apply_texture_swizzle_to_border_color; diff --git a/src/mesa/state_tracker/st_program.c b/src/mesa/state_tracker/st_program.c index 6a69ba7..87571a8 100644 --- a/src/mesa/state_tracker/st_program.c +++ b/src/mesa/state_tracker/st_program.c @@ -1728,6 +1728,12 @@ destroy_program_variants_cb(GLuint key, void *data, void *userData) void st_destroy_program_variants(struct st_context *st) { + /* If shaders can be shared with other contexts, the last context will +* call DeleteProgram
[Mesa-dev] [PATCH 0/3] Creating gallium shaders at link time
Hi, This is a continuation of the previous series. It allows drivers to have only 1 shader variant for every user shader in st/mesa, not counting glDrawPixels and glBitmap variants. In such case, the shader variant is created in LinkShader or ProgramStringNotify. Please review. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir/glsl: Use shader_prog->Name for naming the NIR shader
On Friday, October 09, 2015 07:45:20 AM Jason Ekstrand wrote: > This has the better name to use. Aparently, sh->Name is usually 0. > --- > src/glsl/nir/glsl_to_nir.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp > index 6e1dd84..3284bdc 100644 > --- a/src/glsl/nir/glsl_to_nir.cpp > +++ b/src/glsl/nir/glsl_to_nir.cpp > @@ -150,7 +150,7 @@ glsl_to_nir(const struct gl_shader_program *shader_prog, >if (sh->Program->SamplersUsed & (1 << i)) > num_textures = i; > > - shader->info.name = ralloc_asprintf(shader, "GLSL%d", sh->Name); > + shader->info.name = ralloc_asprintf(shader, "GLSL%d", shader_prog->Name); > if (shader_prog->Label) >shader->info.label = ralloc_strdup(shader, shader_prog->Label); > shader->info.num_textures = num_textures; > Whoops. Right, this is more useful. Reviewed-by: Kenneth Graunkesignature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space
On 10/10/2015 10:17 PM, Ilia Mirkin wrote: On Sat, Oct 10, 2015 at 4:21 PM, Samuel Pitoisetwrote: On 10/10/2015 09:58 PM, Ilia Mirkin wrote: On Sat, Oct 10, 2015 at 3:55 PM, Samuel Pitoiset wrote: On 10/10/2015 09:42 PM, Ilia Mirkin wrote: On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset wrote: This patch looks fine except that it should be a bit more normalized. I mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for PUSH_SPACE calls, sometimes you add it sometimes not. Meh. We need to get our error checking situation straight, but this isn't the patch to do it in. Yeah, but this needs to be clarified. What does? I mean, we should either use PUSH_SPACE everywhere or not at all, and always breaks (or not) when PUSH_SPACE fails. That's really a minor issue. It's actually a major issue. Error-handling is practically non-existent. There are a couple of spots here and there, but it doesn't really scale up. I guess I (semi-)accidentally removed a couple of spots that error checked, but, again, meh. Doing this for real will require some careful thought. Yeah, okay. So we really need to improve error-handling. :) -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/10] radeonsi: re-enable unsafe-fp-math for LLVM 3.8
Hi Marek, I don't get the hang on Dota 2 Reborn with this patch and LLVM/Mesa git. Tested-by: Nick SarnieThanks! On Sat, Oct 10, 2015 at 10:12 PM, Connor Abbott wrote: > FWIW, this isn't quite correct with ARB_shader_precision or GL4.1 -- > it specifies that infinities should be correctly generated through > division by 0, which unsafe-fp-math doesn't guarantee. At least, > that's assuming this is similar to the "fast" per-instruction flag > (http://llvm.org/docs/LangRef.html#fast-math-flags) which says "This > flag implies all the others." > > On Sat, Oct 10, 2015 at 9:29 PM, Marek Olšák wrote: > > From: Marek Olšák > > > > Required for 1/sqrt ==> rsq. > > > > We should finally fix the hang instead of running away from the issue. > This > > assumes the bug is in LLVM and we have time to fix it before the release. > > Include compute shaders as well, which only affects TGSI and thus OpenGL. > > > > Totals: > > SGPRS: 344368 -> 345104 (0.21 %) > > VGPRS: 197552 -> 197420 (-0.07 %) > > Code Size: 7366304 -> 7324692 (-0.56 %) bytes > > LDS: 91 -> 91 (0.00 %) blocks > > Scratch: 1615872 -> 1524736 (-5.64 %) bytes per wave > > > > Totals from affected shaders: > > SGPRS: 146696 -> 147432 (0.50 %) > > VGPRS: 87212 -> 87080 (-0.15 %) > > Code Size: 3852664 -> 3811052 (-1.08 %) bytes > > LDS: 48 -> 48 (0.00 %) blocks > > Scratch: 1179648 -> 1088512 (-7.73 %) bytes per wave > > --- > > src/gallium/drivers/radeon/radeon_llvm_emit.c | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c > b/src/gallium/drivers/radeon/radeon_llvm_emit.c > > index 6b2ebde..4bda4a4 100644 > > --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c > > +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c > > @@ -84,6 +84,13 @@ void radeon_llvm_shader_type(LLVMValueRef F, unsigned > type) > > sprintf(Str, "%1d", llvm_type); > > > > LLVMAddTargetDependentFunctionAttr(F, "ShaderType", Str); > > + > > +#if HAVE_LLVM >= 0x0308 > > + /* This only affects TGSI (OpenGL), so it's okay to set it for > > +* compute shaders too. > > +*/ > > + LLVMAddTargetDependentFunctionAttr(F, "unsafe-fp-math", "true"); > > +#endif > > } > > > > static void init_r600_target() > > -- > > 2.1.4 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space
We still have to push everything out, might as well kick earlier and flip pushbufs when we know we'll need it. This resolves some issues with the new policy of making sure that we always leave a bit of room at the end for fences. Signed-off-by: Ilia MirkinCc: mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/nouveau/nv50/nv50_shader_state.c | 9 ++--- src/gallium/drivers/nouveau/nv50/nv50_transfer.c | 16 +++- src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c | 20 +--- 3 files changed, 10 insertions(+), 35 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c index fdde11f..941555f 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c @@ -65,14 +65,9 @@ nv50_constbufs_validate(struct nv50_context *nv50) PUSH_DATA (push, (b << 12) | (i << 8) | p | 1); } while (words) { - unsigned nr; - - if (!PUSH_SPACE(push, 16)) - break; - nr = PUSH_AVAIL(push); - assert(nr >= 16); - nr = MIN2(MIN2(nr - 3, words), NV04_PFIFO_MAX_PACKET_LEN); + unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN); + PUSH_SPACE(push, nr + 3); BEGIN_NV04(push, NV50_3D(CB_ADDR), 1); PUSH_DATA (push, (start << 8) | b); BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nr); diff --git a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c index be51407..9a3fd1e 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c @@ -187,14 +187,7 @@ nv50_sifc_linear_u8(struct nouveau_context *nv, PUSH_DATA (push, 0); while (count) { - unsigned nr; - - if (!PUSH_SPACE(push, 16)) - break; - nr = PUSH_AVAIL(push); - assert(nr >= 16); - nr = MIN2(count, nr - 1); - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN); + unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN); BEGIN_NI04(push, NV50_2D(SIFC_DATA), nr); PUSH_DATAp(push, src, nr); @@ -395,12 +388,9 @@ nv50_cb_push(struct nouveau_context *nv, nouveau_pushbuf_validate(push); while (words) { - unsigned nr; - - nr = PUSH_AVAIL(push); - nr = MIN2(nr - 7, words); - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1); + unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN); + PUSH_SPACE(push, nr + 7); BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3); PUSH_DATAh(push, bo->offset + base); PUSH_DATA (push, bo->offset + base); diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c index aaec60a..d459dd6 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c @@ -188,14 +188,10 @@ nvc0_m2mf_push_linear(struct nouveau_context *nv, nouveau_pushbuf_validate(push); while (count) { - unsigned nr; + unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN); - if (!PUSH_SPACE(push, 16)) + if (!PUSH_SPACE(push, nr + 9)) break; - nr = PUSH_AVAIL(push); - assert(nr >= 16); - nr = MIN2(count, nr - 9); - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN); BEGIN_NVC0(push, NVC0_M2MF(OFFSET_OUT_HIGH), 2); PUSH_DATAh(push, dst->offset + offset); @@ -234,14 +230,10 @@ nve4_p2mf_push_linear(struct nouveau_context *nv, nouveau_pushbuf_validate(push); while (count) { - unsigned nr; + unsigned nr = MIN2(count, (NV04_PFIFO_MAX_PACKET_LEN - 1)); - if (!PUSH_SPACE(push, 16)) + if (!PUSH_SPACE(push, nr + 10)) break; - nr = PUSH_AVAIL(push); - assert(nr >= 16); - nr = MIN2(count, nr - 8); - nr = MIN2(nr, (NV04_PFIFO_MAX_PACKET_LEN - 1)); BEGIN_NVC0(push, NVE4_P2MF(UPLOAD_DST_ADDRESS_HIGH), 2); PUSH_DATAh(push, dst->offset + offset); @@ -571,9 +563,7 @@ nvc0_cb_bo_push(struct nouveau_context *nv, PUSH_DATA (push, bo->offset + base); while (words) { - unsigned nr = PUSH_AVAIL(push); - nr = MIN2(nr, words); - nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1); + unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN - 1); PUSH_SPACE(push, nr + 2); PUSH_REFN (push, bo, NOUVEAU_BO_WR | domain); -- 2.4.9 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] configure.ac: ensure RM is set
GNU make predefines RM to rm -f but this is not required by POSIX so ensure that RM is set. This fixes "make clean" on OpenBSD. v2: use AC_CHECK_PROG Signed-off-by: Jonathan GrayCC: "10.6 11.0" --- configure.ac | 2 ++ 1 file changed, 2 insertions(+) diff --git a/configure.ac b/configure.ac index 3feec19..f99545f 100644 --- a/configure.ac +++ b/configure.ac @@ -107,6 +107,8 @@ AC_SYS_LARGEFILE LT_PREREQ([2.2]) LT_INIT([disable-static]) +AC_CHECK_PROG(RM, rm, [rm -f]) + AX_PROG_BISON([], AS_IF([test ! -f "$srcdir/src/glsl/glcpp/glcpp-parse.c"], [AC_MSG_ERROR([bison not found - unable to compile glcpp-parse.y])])) -- 2.5.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH shader-db] check_dependencies: refactor to a python script
Deliver consistency with all other shader-db scripts, which are Python scripts. No change in features or output strings. Passed pep8, except for two comment lines suggesting commands to add dependencies to the [require] section of *.shader_test files. Although not a performance critical feature, equivalent performance to Perl script other than process_directories() recursive directory traversal. os.scandir(item) would be significantly faster than os.walk(item), however its use would introduce a minimum dependency on Python 3.5 which is preferably avoided at this time. Signed-off-by: Rhys Kidd--- check_dependencies.pl | 107 -- check_dependencies.py | 82 ++ 2 files changed, 82 insertions(+), 107 deletions(-) delete mode 100755 check_dependencies.pl create mode 100755 check_dependencies.py diff --git a/check_dependencies.pl b/check_dependencies.pl deleted file mode 100755 index 3e49f7f..000 --- a/check_dependencies.pl +++ /dev/null @@ -1,107 +0,0 @@ -#!/usr/bin/perl -# -# Copyright © 2014 Intel Corporation -# -# Permission is hereby granted, free of charge, to any person obtaining a -# copy of this software and associated documentation files (the "Software"), -# to deal in the Software without restriction, including without limitation -# the rights to use, copy, modify, merge, publish, distribute, sublicense, -# and/or sell copies of the Software, and to permit persons to whom the -# Software is furnished to do so, subject to the following conditions: -# -# The above copyright notice and this permission notice (including the next -# paragraph) shall be included in all copies or substantial portions of the -# Software. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL -# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING -# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS -# IN THE SOFTWARE. - -# For checking that shader_test's dependencies are correct. -# -# Run with -# ./check_dependencies.pl shaders/ -# -# And then run a command like these to add dependencies to the [require] -# section: -# -# find shaders/ -name '*.shader_test' -exec grep -l '#version 120' {} + | xargs sed -i -e 's/GLSL >= 1.10/GLSL >= 1.20/' -# find shaders/ -name '*.shader_test' -exec grep -l '#extension GL_ARB_texture_rectangle : require' {} + | xargs sed -i -e 's/GLSL >= 1.20/GLSL >= 1.20\nGL_ARB_texture_rectangle/' - -use strict; -use File::Find; - -die("Not enough arguments: specify a directory\n") if ($#ARGV < 0); - -# The array_diff function is copied from the Array::Utils package and contains -# this copyright: -# -# This module is Copyright (c) 2007 Sergei A. Fedorov. -# All rights reserved. -# -# You may distribute under the terms of either the GNU General Public -# License or the Artistic License, as specified in the Perl README file. -sub array_diff(\@\@) { - my %e = map { $_ => undef } @{$_[1]}; - return @{[ ( grep { (exists $e{$_}) ? ( delete $e{$_} ) : ( 1 ) } @{ $_[0] } ), keys %e ] }; -} - -my @shader_test; - -sub wanted { - push(@shader_test, $File::Find::name) if (/\.shader_test$/); -} - -finddepth(\, @ARGV); - -my $fail = 0; - -foreach my $shader_test (@shader_test) { - my $expected; - my $actual; - my @expected_ext; - my @actual_ext; - - open(my $fh, "<", $shader_test) - or die("cannot open < $shader_test: $!\n"); - - while (<$fh>) { - chomp; - - if (/^GLSL >= (\d)\.(\d\d)/) { - $expected = $1 * 100 + $2; - } - if (/^\s*#\s*version\s+(\d{3})/) { - $actual = $1 if $actual == undef; - $actual = $1 if $actual < $1; - } - - if (/^(GL_\S+)/) { - next if ($1 eq "GL_ARB_fragment_program" || -$1 eq "GL_ARB_vertex_program"); - push(@expected_ext, $1); - } - if (/^\s*#\s*extension\s+(GL_\S+)\s*:\s*require/) { - push(@actual_ext, $1); - } - } - - close($fh); - - if ($actual != undef && $expected != $actual) { - print "$shader_test requested $expected, but requires $actual\n"; - $fail = 1; - } - - my @extension = array_diff(@expected_ext, @actual_ext); - foreach my $extension (@extension) { - print "$shader_test extension $extension mismatch\n"; - $fail = 1; - } -} - -exit($fail); diff --git a/check_dependencies.py b/check_dependencies.py
[Mesa-dev] [PATCH] nouveau: avoid emitting new fences unnecessarily
Right now we emit on every kick, but this is only necessary if something will ever be able to observe that the fence completed. If there are no refs, leave the fence alone and emit it another day. This also happens to work around an issue for the kick handler -- a kick can be a result of e.g. nouveau_bo_wait or explicit kick, or it can be due to lack of space in the pushbuf. We want the emit to happen in the current batch, so we want there to always be enough space. However an explicit kick could take the reserved space for the implicitly-triggered kick's fence emission if it happened right after. With the new mechanism, hopefully there's no way to cause two fences to be emitted into the same reserved space. Signed-off-by: Ilia MirkinCc: mesa-sta...@lists.freedesktop.org Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence) --- src/gallium/drivers/nouveau/nouveau_fence.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/nouveau_fence.c b/src/gallium/drivers/nouveau/nouveau_fence.c index ee4e08d..18b1592 100644 --- a/src/gallium/drivers/nouveau/nouveau_fence.c +++ b/src/gallium/drivers/nouveau/nouveau_fence.c @@ -190,8 +190,10 @@ nouveau_fence_wait(struct nouveau_fence *fence) /* wtf, someone is waiting on a fence in flush_notify handler? */ assert(fence->state != NOUVEAU_FENCE_STATE_EMITTING); - if (fence->state < NOUVEAU_FENCE_STATE_EMITTED) + if (fence->state < NOUVEAU_FENCE_STATE_EMITTED) { + PUSH_SPACE(screen->pushbuf, 8); nouveau_fence_emit(fence); + } if (fence->state < NOUVEAU_FENCE_STATE_FLUSHED) if (nouveau_pushbuf_kick(screen->pushbuf, screen->pushbuf->channel)) @@ -224,8 +226,12 @@ nouveau_fence_wait(struct nouveau_fence *fence) void nouveau_fence_next(struct nouveau_screen *screen) { - if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING) - nouveau_fence_emit(screen->fence.current); + if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING) { + if (screen->fence.current->ref > 1) + nouveau_fence_emit(screen->fence.current); + else + return; + } nouveau_fence_ref(NULL, >fence.current); -- 2.4.9 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/5] Implementation of vec4 equivalent to fs_cmod_propagation optimization
This series implements a vec4 equivalent to fs_cmod_propagation optimization. The last two commits are not really needed for the optimization, are just nice-to-have (imho) that I added while implementing the optimization. Alejandro Piñeiro (5): i965/vec4: nir_emit_if doesn't need to predicate based on all the channels i965/vec4: adding vec4_cmod_propagation optimization i965/vec4: Add unit tests for cmod propagation pass. i965/vec4: use a custom envvar to decide to print the assembly on test_vec4_cmod_propagation i965/vec4: print predicate control at brw_vec4 dump_instruction src/mesa/drivers/dri/i965/Makefile.am | 7 + src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 +- src/mesa/drivers/dri/i965/brw_vec4.h | 1 + .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 + src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 4 +- .../dri/i965/test_vec4_cmod_propagation.cpp| 736 + 7 files changed, 926 insertions(+), 3 deletions(-) create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp create mode 100644 src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] i965/vec4: Add unit tests for cmod propagation pass.
This include the same tests coming from test_fs_cmod_propagation, (non vector glsl types included) plus some new with vec4 types, inspired on the regressions found while the optimization was a work in progress. Additionally, the check of number of instructions after the optimization was changed from EXPECT_EQ to ASSERT_EQ. This was done to avoid a crash on failing tests that expected no optimization, as after checking the number of instructions, there were some checks related to this last instruction opcode/conditional mod. --- src/mesa/drivers/dri/i965/Makefile.am | 7 + .../dri/i965/test_vec4_cmod_propagation.cpp| 736 + 2 files changed, 743 insertions(+) create mode 100644 src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp diff --git a/src/mesa/drivers/dri/i965/Makefile.am b/src/mesa/drivers/dri/i965/Makefile.am index 2e24151..63228a5 100644 --- a/src/mesa/drivers/dri/i965/Makefile.am +++ b/src/mesa/drivers/dri/i965/Makefile.am @@ -58,6 +58,7 @@ TESTS = \ test_fs_saturate_propagation \ test_eu_compact \ test_vf_float_conversions \ + test_vec4_cmod_propagation \ test_vec4_copy_propagation \ test_vec4_register_coalesce @@ -93,6 +94,12 @@ test_vec4_copy_propagation_LDADD = \ $(top_builddir)/src/gtest/libgtest.la \ $(TEST_LIBS) +test_vec4_cmod_propagation_SOURCES = \ + test_vec4_cmod_propagation.cpp +test_vec4_cmod_propagation_LDADD = \ + $(top_builddir)/src/gtest/libgtest.la \ + $(TEST_LIBS) + test_eu_compact_SOURCES = \ test_eu_compact.c nodist_EXTRA_test_eu_compact_SOURCES = dummy.cpp diff --git a/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp b/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp new file mode 100644 index 000..d2fba1b --- /dev/null +++ b/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp @@ -0,0 +1,736 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Alejandro Piñeiro Iglesias+ * + * Based on test_fs_cmod_propagation.cpp + */ + +#include +#include "brw_vec4.h" +#include "brw_vec4_builder.h" +#include "brw_cfg.h" +#include "program/program.h" + +using namespace brw; + +class cmod_propagation_test : public ::testing::Test { + virtual void SetUp(); + +public: + struct brw_compiler *compiler; + struct brw_device_info *devinfo; + struct gl_context *ctx; + struct gl_shader_program *shader_prog; + struct brw_vertex_program *vp; + vec4_visitor *v; +}; + +class cmod_propagation_vec4_visitor : public vec4_visitor +{ +public: + cmod_propagation_vec4_visitor(struct brw_compiler *compiler, + nir_shader *shader) + : vec4_visitor(compiler, NULL, NULL, NULL, shader, NULL, + false, -1) {} + +protected: + /* Dummy implementation for pure virtual methods */ + virtual dst_reg *make_reg_for_system_value(int location, + const glsl_type *type) + { + unreachable("Not reached"); + } + + virtual void setup_payload() + { + unreachable("Not reached"); + } + + virtual void emit_prolog() + { + unreachable("Not reached"); + } + + virtual void emit_program_code() + { + unreachable("Not reached"); + } + + virtual void emit_thread_end() + { + unreachable("Not reached"); + } + + virtual void emit_urb_write_header(int mrf) + { + unreachable("Not reached"); + } + + virtual vec4_instruction *emit_urb_write_opcode(bool complete) + { + unreachable("Not reached"); + } +}; + + +void cmod_propagation_test::SetUp() +{ + ctx = (struct gl_context *)calloc(1, sizeof(*ctx)); + compiler = (struct brw_compiler *)calloc(1, sizeof(*compiler)); + devinfo = (struct brw_device_info *)calloc(1, sizeof(*devinfo));
[Mesa-dev] [PATCH 4/5] i965/vec4: use a custom envvar to decide to print the assembly on test_vec4_cmod_propagation
The complete way to do this would be parse INTEL_DEBUG and print the output if DEBUG_VS (or a new one) is present (see intel_debug.c). But that seems like an overkill for the unit tests, that after all, the most common use case is being run when calling make check. --- Just added the envvar because while working on the optimization I didn't want to recompile if I wanted to see the instructions. src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp b/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp index d2fba1b..e840cb9 100644 --- a/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp +++ b/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp @@ -125,7 +125,7 @@ instruction(bblock_t *block, int num) static bool cmod_propagation(vec4_visitor *v) { - const bool print = false; + const bool print = getenv("TEST_DEBUG"); if (print) { fprintf(stderr, "= Before =\n"); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization
vec4 port of fs_cmod_propagation. Shader-db results: total instructions in shared programs: 6241226 -> 6224469 (-0.27%) instructions in affected programs: 498213 -> 481456 (-3.36%) helped:3082 HURT: 0 --- The final outcome is really similar to fs_brw_cmod_propagation. In fact the only difference is that on fs we have this: if (scan_inst->overwrites_reg(inst->src[0])) { if (scan_inst->is_partial_write() || scan_inst->dst.reg_offset != inst->src[0].reg_offset) break; And on vec4 (this commit) we have this: if (inst->src[0].in_range(scan_inst->dst, scan_inst->regs_written)) { if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) || scan_inst->dst.reg_offset != inst->src[0].reg_offset || (scan_inst->dst.writemask != WRITEMASK_X && scan_inst->dst.writemask != WRITEMASK_XYZW)) break; if (scan_inst->dst.writemask == WRITEMASK_XYZW && inst->src[0].swizzle != BRW_SWIZZLE_XYZW) { break; } So at some point I thought about refactoring it and having one common, like with opt_predicated_break, but that one was possible with just backend_instructions, while here we would need to deal with vec4_instructions and fs_inst, that could be somewhat messy, so I'm leaving this as it is. src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4.h | 1 + .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 + 4 files changed, 166 insertions(+) create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources index 81ef628..c1836d6 100644 --- a/src/mesa/drivers/dri/i965/Makefile.sources +++ b/src/mesa/drivers/dri/i965/Makefile.sources @@ -56,6 +56,7 @@ i965_compiler_FILES = \ brw_util.c \ brw_util.h \ brw_vec4_builder.h \ + brw_vec4_cmod_propagation.cpp \ brw_vec4_copy_propagation.cpp \ brw_vec4.cpp \ brw_vec4_cse.cpp \ diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index e966b96..55e381b 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1867,6 +1867,7 @@ vec4_visitor::run() OPT(dead_code_eliminate); OPT(dead_control_flow_eliminate, this); OPT(opt_copy_propagation); + OPT(opt_cmod_propagation); OPT(opt_cse); OPT(opt_algebraic); OPT(opt_register_coalesce); diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 5e3500c..3c1711d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -149,6 +149,7 @@ public: int var_range_start(unsigned v, unsigned n) const; int var_range_end(unsigned v, unsigned n) const; bool virtual_grf_interferes(int a, int b); + bool opt_cmod_propagation(); bool opt_copy_propagation(bool do_constant_prop = true); bool opt_cse_local(bblock_t *block); bool opt_cse(); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp new file mode 100644 index 000..7e39d2b --- /dev/null +++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp @@ -0,0 +1,163 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Alejandro Piñeiro Iglesias+ * + * Based on brw_fs_cmod_propagation.cpp + */ + +/** @file brw_vec4_cmod_propagation.cpp + * + * Really similar to
[Mesa-dev] [PATCH 5/5] i965/vec4: print predicate control at brw_vec4 dump_instruction
--- I found this useful while I was using INTEL_DEBUG=optimizer after changing how the ifs are emitted. And after all, that info is also included by brw_disasm.c I assumed that at the vec4_visitor we would not need to handle pred_ctrl_align1, but Im not totally sure. src/mesa/drivers/dri/i965/brw_vec4.cpp | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 55e381b..eb81523 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1358,9 +1358,21 @@ vec4_visitor::dump_instruction(backend_instruction *be_inst, FILE *file) vec4_instruction *inst = (vec4_instruction *)be_inst; if (inst->predicate) { - fprintf(file, "(%cf0.%d) ", + static const char *const pred_ctrl_align16[16] = { + "", + "", + ".x", + ".y", + ".z", + ".w", + ".any4h", + ".all4h", + }; + + fprintf(file, "(%cf0.%d%s) ", inst->predicate_inverse ? '-' : '+', - inst->flag_subreg); + inst->flag_subreg, + pred_ctrl_align16[inst->predicate]); } fprintf(file, "%s", brw_instruction_name(inst->opcode)); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] i965/vec4: nir_emit_if doesn't need to predicate based on all the channels
--- I already talked about this with Jason Ekstrand and Matt Turner privately, but just in case somebody else jump to the review: When using BRW_PREDICATE_NORMAL, the if will use all the channels of the register flag. But nir_if only reads from one channel, so that is not needed. Another hint showing that this is safe: the MOV that put the condition on f0 is calling get_nir_src with just one component. That will return always a source with swizzle BRW_SWIZZLE_, so that component is the only to be used. This commit is not needed/solving anything per-se, but it is needed in order to be able to implement vec4_cmod_propagation with a good overall outcome. src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 41bd80d..e05745f 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -193,7 +193,9 @@ vec4_visitor::nir_emit_if(nir_if *if_stmt) vec4_instruction *inst = emit(MOV(dst_null_d(), condition)); inst->conditional_mod = BRW_CONDITIONAL_NZ; - emit(IF(BRW_PREDICATE_NORMAL)); + /* We can just predicate based on the X channel, as the condition only +* reads from one channel */ + emit(IF(BRW_PREDICATE_ALIGN16_REPLICATE_X)); nir_emit_cf_list(_stmt->then_list); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91643] mesa-demos-8.2.0 (latest released version) fails to build against mesa-10.6.4-2.mga6.tainted.src.rpm
https://bugs.freedesktop.org/show_bug.cgi?id=91643 Dennis Schriddechanged: What|Removed |Added CC||devuran...@gmx.net -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] New stable-branch 11.0 candidate pushed
On Thu, Oct 8, 2015 at 11:50 AM, Emil Velikovwrote: > Hello list, > > The candidate for the Mesa 11.0.3 is now available. Currently we have: > - 46 queued > - 18 nominated (outstanding) > - and 7 rejected/obsolete patches > > This time around we have a bunch of EGL patches, mangledGL build fixes > and a healthy amount of driver bugfixes - radeonsi, nouveau, i915 and i965. Hi, I've tested this branch on radeonsi. There are no regressions. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev