On 12/01/2015 11:28 AM, Ilia Mirkin wrote: > On Tue, Dec 1, 2015 at 2:13 PM, Ian Romanick <i...@freedesktop.org> wrote: >> On 11/30/2015 04:41 PM, Ilia Mirkin wrote: >>> On Mon, Nov 30, 2015 at 7:15 PM, Matt Turner <matts...@gmail.com> wrote: >>>> On Mon, Nov 30, 2015 at 3:51 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: >>>>> On Mon, Nov 30, 2015 at 6:38 PM, Matt Turner <matts...@gmail.com> wrote: >>>>>> On Mon, Nov 30, 2015 at 3:34 PM, Ilia Mirkin <imir...@alum.mit.edu> >>>>>> wrote: >>>>>>> On Mon, Nov 30, 2015 at 6:32 PM, Matt Turner <matts...@gmail.com> wrote: >>>>>>>> --- >>>>>>>> src/glsl/builtin_functions.cpp | 15 ++++++++++++++- >>>>>>>> 1 file changed, 14 insertions(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/src/glsl/builtin_functions.cpp >>>>>>>> b/src/glsl/builtin_functions.cpp >>>>>>>> index 88f9a71..2f21ffc 100644 >>>>>>>> --- a/src/glsl/builtin_functions.cpp >>>>>>>> +++ b/src/glsl/builtin_functions.cpp >>>>>>>> @@ -538,6 +538,7 @@ private: >>>>>>>> ir_variable *in_var(const glsl_type *type, const char *name); >>>>>>>> ir_variable *out_var(const glsl_type *type, const char *name); >>>>>>>> ir_constant *imm(float f, unsigned vector_elements=1); >>>>>>>> + ir_constant *imm(bool b, unsigned vector_elements=1); >>>>>>>> ir_constant *imm(int i, unsigned vector_elements=1); >>>>>>>> ir_constant *imm(unsigned u, unsigned vector_elements=1); >>>>>>>> ir_constant *imm(double d, unsigned vector_elements=1); >>>>>>>> @@ -3000,6 +3001,12 @@ builtin_builder::out_var(const glsl_type *type, >>>>>>>> const char *name) >>>>>>>> } >>>>>>>> >>>>>>>> ir_constant * >>>>>>>> +builtin_builder::imm(bool b, unsigned vector_elements) >>>>>>>> +{ >>>>>>>> + return new(mem_ctx) ir_constant(b, vector_elements); >>>>>>>> +} >>>>>>>> + >>>>>>>> +ir_constant * >>>>>>>> builtin_builder::imm(float f, unsigned vector_elements) >>>>>>>> { >>>>>>>> return new(mem_ctx) ir_constant(f, vector_elements); >>>>>>>> @@ -4358,7 +4365,13 @@ >>>>>>>> builtin_builder::_notEqual(builtin_available_predicate avail, >>>>>>>> ir_function_signature * >>>>>>>> builtin_builder::_any(const glsl_type *type) >>>>>>>> { >>>>>>>> - return unop(always_available, ir_unop_any, glsl_type::bool_type, >>>>>>>> type); >>>>>>>> + ir_variable *v = in_var(type, "v"); >>>>>>>> + MAKE_SIG(glsl_type::bool_type, always_available, 1, v); >>>>>>>> + >>>>>>>> + const unsigned vec_elem = v->type->vector_elements; >>>>>>>> + body.emit(ret(expr(ir_binop_any_nequal, v, imm(false, vec_elem)))); >>>>>>> >>>>>>> This results in an extra comparison generated by the st_glsl_to_tgsi >>>>>>> code, which can be annoying to get rid of... Why do you need to do >>>>>>> this? >>>>>> >>>>>> Couldn't we fix the st_glsl_to_tgsi to handle that better? I wouldn't >>>>>> be surprised to find such a thing in the wild. >>>>> >>>>> I guess it could special-case the exact IR generated there? The thing >>>>> is that it just OR's everything together (when you have native integer >>>>> support... which everything does outside of r300/nv30), which you can >>>>> only do in a few select cases. If you'd like to implement that, I >>>>> wouldn't object. As-is, it's a pessimization though. >>>> >>>> I'm not familiar with any of this code and I don't know how to test or >>>> debug it -- sorry. >>>> >>>> It should be trivial to optimize if someone cares. I believe all you >>>> need to do is skip emitting the SNE and instead copy src[0] to temp >>>> under these circumstances: >>>> >>>> if (ir->operands[0]->type == glsl_type::bool_type && >>>> ir->operands[1]->as_constant() && >>>> ir->operands[1]->as_constant()->is_zero()) >>> >>> As long as you can build the gallium swrast (and enable debug), you >>> can add ST_DEBUG=tgsi to get the generated TGSI printed out. >> >> Does the generated TGSI really matter? We don't generally care if a >> change causes more NIR instructions to be generated, for example... >> unless it's a massive amount more. > > Agreed. > >> >> The real question is what actually gets executed on the GPUs. If we >> produce more intermediate instructions, but the backends still generate >> the same (or perhaps even better) code, it should be okay. Does any GPU >> actually emit worse code due to this change or the related changes later >> in the series? > > I full agree with your comments. If there's a sampler shader, I can > easily test what nouveau does with it... I was going based on how I > thought it would work though based on my familiarity with the various > optimizations. I could have gotten it wrong though. The problem is > that the boolean-ness of the values gets lost, and these optimizations > become less safe. Or less detected. > > I can guarantee that it'll generate an extra instruction nv30 since > that doesn't have a compiler, so it'll go from just "DP4" to "SNE; > DP4". But I don't know if I care *that* much. > > So this is the test I came up with: > > FRAG > DCL IN[0], GENERIC[0], PERSPECTIVE > DCL OUT[0], COLOR > DCL TEMP[0..4], LOCAL > IMM[0] UINT32 { 0, 0, 0, 0 } > 0: MOV TEMP[0], IN[0] > 1: USNE TEMP[0], TEMP[0], IMM[0] > 2: OR TEMP[0].x, TEMP[0].xxxx, TEMP[0].yyyy > 3: OR TEMP[0].x, TEMP[0].xxxx, TEMP[0].zzzz > 4: OR TEMP[0].x, TEMP[0].xxxx, TEMP[0].wwww > 5: MOV OUT[0], TEMP[0].xxxx > 9: END
I think the more common case, and certainly the case that Matt was targeting with the optimization in patch 6, is where the result of any(v) is used in a conditional: if (any(v)) { do_this(); } else { do_that(); } or result = any(v) ? a : b; Looking at my very out-of-date shader-db, 100% of the uses excluding one application fit one of these patterns. That app has a bunch of macros that do 'any(notEqual(v, vec#(0.0)))', so TGSI is already going to generate extra instructions. > So let's see what happens on nv50 (targeted at a G84 to avoid > potential G80-related idiocy... don't think there would be any here, > but who knows): > > NV50_PROG_DEBUG=1 ./src/gallium/drivers/nouveau/nouveau_compiler -a 84 > slt.tgsi > > which ends up in: > > EMIT: mov u32 $r4 0x00000000 (8) > EMIT: set u32 $r1 ne $r1 $r4 (8) > EMIT: set u32 $r2 ne $r2 $r4 (8) > EMIT: set u32 $r3 ne $r3 $r4 (8) > EMIT: set u32 $r0 ne $r0 $r4 (8) > EMIT: or u32 $r1 $r1 $r2 (8) > EMIT: or u32 $r1 $r1 $r3 (8) > EMIT: or u32 $r3 $r1 $r0 (8) It seems like the rotation of (x != 0) || (y != 0) || (z != 0) || (w != 0) to (x | y | z | w) != 0 would really help here... and should be safe? You'd end up with: EMIT: mov u32 $r4 0x00000000 (8) EMIT: or u32 $r1 $r1 $r2 (8) EMIT: or u32 $r1 $r1 $r3 (8) EMIT: or u32 $r3 $r1 $r0 (8) EMIT: set u32 $r3 ne $r3 $r4 (8) > Naturally, those set's all go away if the USNE does. With -a c0 (i.e. > fermi, which will be ~identical for everything later as well), we get: > > 6: set u8 $p0 ne u32 $r1 $r63 (8) > 7: set or u8 $p0 ne u32 $r2 $r63 $p0 (8) > 8: set or u8 $p0 ne u32 $r3 $r63 $p0 (8) > 9: set or u32 $r3 ne $r0 $r63 $p0 (8) > > [$r63 is always 0] > > And without the USNE it just has the or's: > > 6: or u32 $r1 $r1 $r2 (8) > 7: or u32 $r1 $r1 $r3 (8) > 8: or u32 $r3 $r1 $r0 (8) > > [I'm ignoring the unrelated bits, like moving stuff around, > interpolation, etc]. If nothing else, this is one less instruction. > > Basically the issue is that we don't have the information that the > input is a *clean* bool, like we would if there were type info, and so And for the record, I had always opposed NIR being typeless fir this very reason. In this case the type is, basically, a primitive form of range tracking. Sophisticated range tracking makes it obsolete, but that's also a lot of work. > we must "clean" it first. We could try to see if the input comes from > another set... and we sometimes do, but for a limited quantity of > scenarios, and I don't think this is one of them. > > -ilia _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev