Issue |
97299
|
Summary |
[TypeLegalizer] Wrong code for certain non-power-of-2 sized int->vec BITCAST
|
Labels |
new issue
|
Assignees |
|
Reporter |
uweigand
|
On a big-endian machine, the following code:
```
define i1 @test(iN %in) {
%vec = bitcast iN %in to <N x i1>
%bit = extractelement <N x i1> %vec, i32 0
ret i1 %bit
}
```
should, for any choice of `N`, extract the most significant bit of the input. And this does indeed work correctly in most cases, specifically when N is either >= 16 or else a power of two.
However, for non-power-of-two sizes less than 16, at least on s390x, I see wrong code generation. What distinguishes the cases is that for power of two sizes, the <N x i1> vector is either already legal or else handled via `SplitVecRes_BITCAST` (which looks to be correct). For non power of two sizes, the `WidenVecRes_BITCAST` routine is invoked instead, where we typically fall into this case:
```
// If the InOp is promoted to the same size, convert it. Otherwise,
// fall out of the switch and widen the promoted input.
SDValue NInOp = GetPromotedInteger(InOp);
EVT NInVT = NInOp.getValueType();
if (WidenVT.bitsEq(NInVT)) {
// For big endian targets we need to shift the input integer or the
// interesting bits will end up at the wrong place.
if (DAG.getDataLayout().isBigEndian()) {
unsigned ShiftAmt = NInVT.getSizeInBits() - InVT.getSizeInBits();
EVT ShiftAmtTy = TLI.getShiftAmountTy(NInVT, DAG.getDataLayout());
assert(ShiftAmt < WidenVT.getSizeInBits() && "Too large shift amount!");
NInOp = DAG.getNode(ISD::SHL, dl, NInVT, NInOp,
DAG.getConstant(ShiftAmt, dl, ShiftAmtTy));
}
return DAG.getNode(ISD::BITCAST, dl, WidenVT, NInOp);
}
InOp = NInOp;
InVT = NInVT;
break;
```
If N is larger than 16, then `WidenVT.bitsEq(NInVT)` check succeeds and we get a simple shift and bitcast, yielding the correct result.
However, for smaller values of N, the problem is that `WidenVT` is either `v8i1` or `v16i1`, but `NInVT` is always `i32` on s390x (since we don't support native arithmetic on smaller integer types, all integers are promoted to at least `i32`). In that case, the check fails and we fall through the code below, with `InOp` now replaced by the `i32` version of the input.
The next block is guarded by ` if (WidenSize % InScalarSize == 0`, which is also false since WidenSize is actually *smaller* than `InScalarSize` - the latter has been replaced with the widened size of `i32`. Therefore we fall through to the end and simply do:
```
return CreateStackStoreLoad(InOp, WidenVT);
```
This doesn't appear to account for the possibility that `InOp` has a different (larger!) in-memory store size than `WidenVT` at all. It simply stores the widened input and loads the result from its first bytes, which on a big-endian system are typically just the padding bits.
I'm now wondering how to fix this. Ideally, this should result is just a shift (and mask) for those inputs as well. But it if has to fall through to the store/load method, I guess that should handle size differences somehow ...
Any suggestions welcome!
@topperc @arsenm @mikaelholmen @efriedma-quic
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs