This isn't reviewed yet, any feedback? Iago
On Tue, 2018-05-15 at 13:05 +0200, Iago Toral Quiroga wrote: > NIR assumes that all booleans are 32-bit, so drivers need to produce > 32-bit > booleans even if they can produce native booleans of a different bit- > size, like > Intel does. This means that if we have a 16-bit CMP instruction, we > generate a > 16-bit boolean that we immediately convert to 32-bit, since that is > the bit-size > expected by NIR for all consumers of the boolean. > > This backend optimization pass identifies these cases after we are > done > translating from NIR to FS IR, and propagates the lower bit-size > booleans > to allow DCE to remove the 32-bit conversions. The pass should run > early > after translating from NIR, since it assumes that boolean conversions > to > 32-bit take place immediately after the corresponding CMP > instructions. > > This has been tested with existing and work-in-progress CTS tests as > well > as some had-hoc VkRunner I wrote. > > For more context you can read this discussion: > https://lists.freedesktop.org/archives/mesa-dev/2018-April/192751.htm > l > > One point raised by Jason during the discussion linked above was that > we might > need to canonicalize booleans of different native bit-sizes when they > are > combined in boolean expressions. However, as indicated in the commit > log for the > last patch in the series, my interpretation of the PRM is that the > hardware can > handle this situation without us having to do anything about it. The > last patch > contains canonicalization code under a disabled #if guard anyway, > just in case > reviewers think this is needed in the end and want to have a look at > what it > could look like. > > Alternatively to what is being done here, we could also change the > way > we construct CMP instructions to take advantage of the PRM > documentation that > says that CMP instructions can mix and match *B, *W and *D for their > source > and destination arguments since gen5 to always produce canonical 32- > bit bools > like NIR expects. However, since all hardware gens still produce 16- > bit booleans > for half-float, we would still need to handle that case specially > with a similar > pass so we would not gaining much from that. Also, in that case we > would always > operate with 32-bit booleans, losing the possibility to emit native > 16-bit > boolean instructions where possible. > > Iago Toral Quiroga (3): > intel/compiler: make brw_reg_type_from_bit_size usable from other > places > intel/compiler: add a region_match() helper > intel/compiler: add an optimization pass for booleans > > src/intel/compiler/brw_fs.cpp | 291 > ++++++++++++++++++++++++++++++++++++++ > src/intel/compiler/brw_fs.h | 5 + > src/intel/compiler/brw_fs_nir.cpp | 59 -------- > src/intel/compiler/brw_ir_fs.h | 13 ++ > 4 files changed, 309 insertions(+), 59 deletions(-) > _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
