On 21.04.2018 02:32, Bas Nieuwenhuizen wrote:
On Fri, Apr 20, 2018 at 5:16 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote:
On Fri, Apr 20, 2018 at 5:16 AM, Nicolai Hähnle <nhaeh...@gmail.com> wrote:

On 20.04.2018 10:21, Iago Toral wrote:

Hi,

while developing support for Vulkan shaderInt16 on Anvil I came across
a feature of NIR that was a bit inconvenient: bools are always 32-bit
by design, but the Intel hardware produces 16-bit bool results for 16-
bit comparisons, so that creates a problem that manifests like this:

vec1 32 ssa_21 = fge ssa_20, ssa_16
vec1 16 ssa_22 = b2f ssa_21


I was thinking about this a bit this morning and it gets even more sticky.
What happens if you have

bool e = (a < b) && (c < d);

where a and b are 16-bit and c and d are 32-bit?  In this case, one
comprison has a 32-bit value and one has a 16-bit value and you have to pick
one for the &&.


Our CMP instruction will produce a 16-bit boolean result for the first
NIR instruction (where NIR expects it to be 32-bit), so by the time we
emit the second instruction in the driver the bit-size for the operand
of b2f provided by NIR no longer matches the reality and we emit
incorrect code.

This seems to have been a consicious design choice in NIR, and while
discussing this with Jason he was unsure how much we wanted to change
this  or how to do it, given how thoroughly 32-bit bools are baked into
NIR and the complexities that modifying this would also bring to our
bit-size validation code.

I have been considering alternatives that didn't involve changing NIR
to support multiple bit-sizes for booleans:

1) Drivers that need to emit smaller booleans could try to fix the
generated NIR by correcting the expected bit-sizes for CMP
instructions. This would be rather trivial to implement in drivers (and
maybe we could even make a generic pass for other drivers to use if
they need it) but this will make the validator complain because it
won't recognize comparisons with 16-bit bool outputs as valid NIR
opcodes. I also found instances where nir_search would complain about
mismatching bit-sizes. I haven't looked any further into it yet though,
so maybe we can reasonably work around these issues.

2) Drivers could handle this specially when they emit code from NIR.
Specifically, when they see a 32-bit boolean source in an instruction,
they would have to search for the instruction that produced that source
value and check whether it is a 16-bit or a 32-bit boolean to emit
proper code for the instruction.

3) Drivers can just convert the 16-bit bool result they generate for
16-bit cmp to the 32-bit bool that NIR expects, and then possibly run
an optimization pass to eliminate these extra conversions and fix up
the code accordingly.


radeonsi(NIR) and radv already use option 3, since GCN hardware really
wants to treat bools as 1-bit value, so that's what I'd suggest. The
optimizations that cleanup the conversions happen in LLVM for us.


Is this a GCN thing or an LLVM thing?  It would be neat if your hardware had
1-bit registers. :-)  We sort-of do but they're special flag registers and
we have very few of them.

LLVM. For GCN  HW we use a 64-bit register that is shared between
lanes (i.e. having 1 bit for each lane)

Which means, if you think about it, that using i1 for bool _is_ a GCN thing in the end ;)

But admittedly it's semantics.

Cheers,
Nicolai



--Jason

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to