| Issue |
180621
|
| Summary |
[HLSL][DirectX]
|
| Labels |
HLSL
|
| Assignees |
|
| Reporter |
inbelic
|
Consider the following:
```HLSL
RWStructuredBuffer<uint> Out : register(u0);
[numthreads(8,1,1)]
void main(uint3 TID : SV_GroupThreadID) {
for (uint i = 0; i < 8; i++) {
if (i == TID.x) {
Out[TID.x] = WaveActiveMax(TID.x);
break;
}
}
}
```
Clang currently optimizes away the loop through a series of passes (`LoopRotatePass` -> `IndVarSimplifyPass` -> `SimpleLoopUnswitchPass` -> ...) such that it becomes equivalent to the following:
```HLSL
RWStructuredBuffer<uint> Out : register(u0);
[numthreads(8,1,1)]
void main(uint3 TID : SV_GroupThreadID) {
if (i < 8 && i == TID.x) {
Out[TID.x] = WaveActiveMax(TID.x);
}
}
```
The expected behaviour is to be a lock-step through each iteration of the for loop as we are invoking the convergent op. This means we should not allow for this optimization to remove the loop in the convergent case.
The spirv code generation path appears to account for this with the use of `convergencectrl` attributes on the operations. This is demonstrated here: https://godbolt.org/z/s53W8coWv, where we see that the `LoopRotatePass` cannot modify the control flow (presumably) because of the `convergencectrl` attributes.
It seems like the intuitive path forward is to equivalently decorate convergent operations with the `convergencectrl` attributes even when not handling spirv. These would then be stripped in dxil-op-lower.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs