On Thu, 2018-07-05 at 15:47 -0700, Caio Marcelo de Oliveira Filho wrote: > (I had to stop reading to go home last Tuesday, so here are the > remaining comments.) > > > On Tue, May 15, 2018 at 01:05:21PM +0200, Iago Toral Quiroga wrote: > > NIR assumes that all booleans are 32-bit but Intel hardware > > produces > > booleans of the same size as the operands to the CMP instruction, > > so we > > can actually have 8-bit and 16-bit booleans. To work around this > > mismatch between NIR and the hardware, we emit boolean conversions > > to > > 32-bit right after emitting the CMP instruction during the NIR->FS > > pass, which makes interfacing with NIR a lot easier, but can leave > > unnecessary boolean conversions in the shader code. > > Question: have you explored handling this at the NIR->FS conversion? > I.e. instead of generate the cmp + mov and then look for the cmp + > mov > to fix up; when generating a cmp, perform those checks (at nir level) > and then pick the right bitsize.
It is not that easy. The problem is that NIR will continue to come at us with 32-bit boolean instructions after the CMP+MOV, so instead of prpagating forward for every conversion, now, for every bool we find in IR we'd need to go back in the FS program to check if it is a real 32- bit boolean or not to decide what to emit. I don't see any benefit, plus we would be coupling all this with the translation implementation, which I think is less nice than having it being a completely separate thing. Anyway, there is a major issue with the current patch that I have found this week thanks to some new CTS tests: when we propagate the bitsize of a logical instruction to its destination, that affects all its consumers even outside the current block, so we need to handle propagation across blocks, which adds a few more problems, so I still need to figure out how to deal with that properly and whether that is something we want to do (there is a reason why no other opt/lowering passes do cross-block changes...). Iago > > > +/** > > + * Propagates the bit-size of the destination of a boolean > > instruction to > > + * all its consumers. If propagate_from_source is True, then the > > producer > > + * is a conversion MOV from a low bit-size boolean to 32-bit, and > > in that > > + * case the propagation happens from the source of the instruction > > instead > > + * of its destination. > > + */ > > +static bool > > +propagate_bool_bit_size(fs_inst *inst, bool propagate_from_source) > > +{ > > + assert(!propagate_from_source || inst->opcode == > > BRW_OPCODE_MOV); > > + > > + bool progress = false; > > + > > + const unsigned bit_size = 8 * (propagate_from_source ? > > + type_sz(inst->src[0].type) : type_sz(inst->dst.type)); > > + > > + /* Look for any follow-up instructions that sources from the > > boolean > > + * result of the producer instruction and rewrite them to use > > the correct > > + * bit-size. > > + */ > > + foreach_inst_in_block_starting_from(fs_inst, fixup_inst, inst) > > { > > + if (!inst_supports_boolean(fixup_inst)) > > + continue; > > Should we care about other instruction clobbering the contents of > inst->dst, or at this point of the optimization we can count on it > not > being? > > > > + /* If it is a plain boolean conversion to 32-bit, then > > look for any > > + * follow-up instructions that source from the 32-bit > > boolean and > > + * rewrite them to source from the output of the CMP > > (which is the > > + * source of the conversion instruction) directly if > > possible. > > + */ > > + progress = propagate_bool_bit_size(conv_inst, true) || > > progress; > > + } > > +#if 0 > > + else if (inst_supports_boolean(inst) && inst->sources > 1) > > { > > If you end up enabling this section, I suggest move the > inst_supports_boolean() check to the beginning of the for-loop, as an > early return. Makes the condition for the cases we are handling > cleaner. > > > > > + /* For all logical instructions that can take more than > > one operand > > + * we need to ensure that all of them have matching bit- > > sizes. If they > > + * don't, it means that the original shader code is > > operating boolean > > + * expressions with different native bit-sizes and we > > need to choose > > + * a canonical boolean form for all the operands, which > > requires to > > + * inject conversions to temporaries. We choose the bit- > > size of the > > + * destination as the canonical form (which must be a 32- > > bit boolean > > + * since we only propagate smaller bit-sizes to the > > destination if we > > + * managed to convert all the operands to the same bit- > > size) because > > + * that way we don't need to worry about propagating the > > destination > > + * bit-size down the line. > > + */ > > To make this comment less heavy, I'd move the assumption about the > destination being 32-bit right above the assert, which is kind of an > explanation of the assertion. > > > > @@ -6240,6 +6471,7 @@ fs_visitor::optimize() > > > > OPT(opt_drop_redundant_mov_to_flags); > > OPT(remove_extra_rounding_modes); > > + OPT(opt_bool_bit_size); > > It could be useful to have a short comment here about the importance > of calling opt_bool_bit_size at this point. > > > > Thanks, > Caio > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev