Issue 97299
Summary [TypeLegalizer] Wrong code for certain non-power-of-2 sized int->vec BITCAST
Labels new issue
Assignees
Reporter uweigand
    On a big-endian machine, the following code:
```
define i1 @test(iN %in) {
  %vec = bitcast iN %in to <N x i1>
  %bit = extractelement <N x i1> %vec, i32 0
  ret i1 %bit
}
```

should, for any choice of `N`, extract the most significant bit of the input.   And this does indeed work correctly in most cases, specifically when N is either >= 16 or else a power of two.

However, for non-power-of-two sizes less than 16, at least on s390x, I see wrong code generation.   What distinguishes the cases is that for power of two sizes, the <N x i1> vector is either already legal or else handled via `SplitVecRes_BITCAST` (which looks to be correct).  For non power of two sizes, the `WidenVecRes_BITCAST` routine is invoked instead, where we typically fall into this case:
```
    // If the InOp is promoted to the same size, convert it.  Otherwise,
    // fall out of the switch and widen the promoted input.
    SDValue NInOp = GetPromotedInteger(InOp);
    EVT NInVT = NInOp.getValueType();
    if (WidenVT.bitsEq(NInVT)) {
      // For big endian targets we need to shift the input integer or the
      // interesting bits will end up at the wrong place.
      if (DAG.getDataLayout().isBigEndian()) {
 unsigned ShiftAmt = NInVT.getSizeInBits() - InVT.getSizeInBits();
 EVT ShiftAmtTy = TLI.getShiftAmountTy(NInVT, DAG.getDataLayout());
 assert(ShiftAmt < WidenVT.getSizeInBits() && "Too large shift amount!");
        NInOp = DAG.getNode(ISD::SHL, dl, NInVT, NInOp,
 DAG.getConstant(ShiftAmt, dl, ShiftAmtTy));
 }
      return DAG.getNode(ISD::BITCAST, dl, WidenVT, NInOp);
    }
 InOp = NInOp;
    InVT = NInVT;
    break;
```

If N is larger than 16, then `WidenVT.bitsEq(NInVT)` check succeeds and we get a simple shift and bitcast, yielding the correct result.

However, for smaller values of N, the problem is that `WidenVT` is either `v8i1` or `v16i1`, but `NInVT` is always `i32` on s390x (since we don't support native arithmetic on smaller integer types, all integers are promoted to at least `i32`).   In that case, the check fails and we fall through the code below, with `InOp` now replaced by the `i32` version of the input.

The next block is guarded by `  if (WidenSize % InScalarSize == 0`, which is also false since WidenSize is actually *smaller* than `InScalarSize` - the latter has been replaced with the widened size of `i32`.   Therefore we fall through to the end and simply do:

```
  return CreateStackStoreLoad(InOp, WidenVT);
```

This doesn't appear to account for the possibility that `InOp` has a different (larger!) in-memory store size than `WidenVT` at all.   It simply stores the widened input and loads the result from its first bytes, which on a big-endian system are typically just the padding bits.

I'm now wondering how to fix this.  Ideally, this should result is just a shift (and mask) for those inputs as well.  But it if has to fall through to the store/load method, I guess that should handle size differences somehow ...

Any suggestions welcome!

@topperc @arsenm @mikaelholmen @efriedma-quic 

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to