| Issue |
86647
|
| Summary |
Undefined behavior in `CastInfo::castFailed` with `From=<MLIR interface>`
|
| Labels |
mlir
|
| Assignees |
|
| Reporter |
ingomueller-net
|
I observed a `TypeSwitch` taking the "wrong" case in the snippet below:
```cpp
llvm::TypeSwitch<ExpressionOpInterface, int>(op)
.Case<FieldReferenceOp, LiteralOp>([&](auto op) {
llvm::errs() << __PRETTY_FUNCTION__ << "\n";
op.dump();
return 42;
})
```
which produced the following output during [CI on Github](https://github.com/iree-org/iree-llvm-sandbox/actions/runs/8374812356/job/22930857368):
```cpp
int (anonymous namespace)::exportOperation(mlir::substrait::FieldReferenceOp)
%2 = substrait.literal -1 : si1
```
The line with `__PRETTY_FUNCTION__` shows that we are in the `FieldReferenceOp` case but the op printed is actually a `literal`/`LiteralOp`, so the `TypeSwitch` entered the wrong case.
Curiously, this works fine on my local machine (using Clang 14, 15, and 17) but fails consistently on CI (using Clang 14, 15, and 17).
After a lot of printf debugging, I think that I found what the problem is: Eventually, `CastInfo::castFailed` is called when we try a case that doesn't correspond to the runtime type of the argument.
```cpp
static inline CastReturnType castFailed() {
return CastReturnType(nullptr);
}
```
The problem is that `CastReturnType` at this point is `const ExpressionOpInterface &`. Notice the `&`! So we seem to create a *reference* to `nullptr` and then return that. Indeed, if I compile with ASan and UBsan, I get this in CI:
```
==21930==ERROR: AddressSanitizer: stack-use-after-return on address 0x7fef0de7fb20 at pc 0x7fef2d7c165c bp 0x7fffb4ebd190 sp 0x7fffb4ebd188
READ of size 8 at 0x7fef0de7fb20 thread T0
#0 0x7fef2d7c165b in llvm::TypeSwitch<mlir::substrait::ExpressionOpInterface, mlir::FailureOr<std::unique_ptr<substrait::proto::_expression_, std::default_delete<substrait::proto::_expression_>>>>& llvm::TypeSwitch<mlir::substrait::ExpressionOpInterface, mlir::FailureOr<std::unique_ptr<substrait::proto::_expression_, std::default_delete<substrait::proto::_expression_>>>>::Case<mlir::substrait::FieldReferenceOp, (anonymous namespace)::exportOperation(mlir::substrait::ExpressionOpInterface)::$_0&>((anonymous namespace)::exportOperation(mlir::substrait::ExpressionOpInterface)::$_0&) /home/runner/work/iree-llvm-sandbox/iree-llvm-sandbox/sandbox/third_party/llvm-project/llvm/include/llvm/ADT/TypeSwitch.h:151:[26](https://github.com/iree-org/iree-llvm-sandbox/actions/runs/8391967477/job/22983497415#step:13:27)
...
```
Curiously, again, the sanitizers don't complain on my local machine.
I am not 100% sure whether my understanding of the code above returning a local reference is actually correct, but in the following, less templated code, I do get a warning:
```cpp
using CastReturnType = const ExpressionOpInterface &;
auto castFailed = []() -> CastReturnType {
return CastReturnType(nullptr);
// ^~~~~~~
// warning: returning reference to local temporary object [-Wreturn-stack-address]
};
```
As a simple work-around, I changed the original code to use `TypeSwitch<Operation *, int>` instead of `TypeSwitch<ExpressionOpInterface, int>`, for which `CastReturnType` is a pointer type that is value constructed from `nullptr`, which works fine.
However, I find the current behavior extremely dangerous. If using `TypeSwitch<ExpressionOpInterface, int>` isn't expected to work, I think there should be a mechanism that prevents compilation with those arguments.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs