Author: adams381 Date: 2026-06-11T21:54:19Z New Revision: 0d490215a541cf91b8e24e697570bb1cc938078d
URL: https://github.com/llvm/llvm-project/commit/0d490215a541cf91b8e24e697570bb1cc938078d DIFF: https://github.com/llvm/llvm-project/commit/0d490215a541cf91b8e24e697570bb1cc938078d.diff LOG: [CIR] Lower string literals with high-bit bytes (#203384) A string literal containing a byte >= 0x80 crashes CIR-to-LLVM lowering. `convertStringAttrToDenseElementsAttr` builds each element's `APInt` from a signed `char`, so a high-bit byte sign-extends to a 64-bit value that no longer fits the 8-bit element width and trips the `APInt` constructor assertion (`isUIntN(BitWidth, val) && "Value is not an N-bit unsigned value"`). Interpreting each string byte as `unsigned char` fixes it, mirroring what #197269 did for scalar character literals. The string-literal array path was the remaining site with the same defect, and the lowered LLVM is byte-identical to classic CodeGen. Repro: `char s[] = "\x80";` compiled with `-fclangir -emit-llvm`. This also clears a cluster of SingleSource gcc-torture globals that embed high-byte string data. These globals compiled until #198427 removed the trailing-zeros fast-path in the same lowering. String literals always carry a null terminator (trailing zeros), so they previously took the insertvalue path and never reached `convertStringAttrToDenseElementsAttr`; #198427 routes them through it and exposed this latent sign-extension bug. Added: clang/test/CIR/CodeGen/string-literal-high-bytes.c Modified: clang/lib/CIR/Lowering/LoweringHelpers.cpp Removed: ################################################################################ diff --git a/clang/lib/CIR/Lowering/LoweringHelpers.cpp b/clang/lib/CIR/Lowering/LoweringHelpers.cpp index 17cc583a37a1e..b903560ada18f 100644 --- a/clang/lib/CIR/Lowering/LoweringHelpers.cpp +++ b/clang/lib/CIR/Lowering/LoweringHelpers.cpp @@ -38,8 +38,11 @@ convertStringAttrToDenseElementsAttr(cir::ConstArrayAttr attr, llvm::SmallVector<mlir::APInt> values; values.reserve(totalSize); + // String bytes are raw values; interpret each as an unsigned byte so a + // high-bit char (>= 0x80) does not sign-extend to a value that overflows + // the element bit width when constructing the APInt. for (const char element : stringAttr) - values.emplace_back(bitWidth, element); + values.emplace_back(bitWidth, static_cast<unsigned char>(element)); values.insert(values.end(), trailingZeros, mlir::APInt::getZero(bitWidth)); diff --git a/clang/test/CIR/CodeGen/string-literal-high-bytes.c b/clang/test/CIR/CodeGen/string-literal-high-bytes.c new file mode 100644 index 0000000000000..935a4abe42ada --- /dev/null +++ b/clang/test/CIR/CodeGen/string-literal-high-bytes.c @@ -0,0 +1,16 @@ +// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fclangir -emit-cir %s -o %t.cir +// RUN: FileCheck --check-prefix=CIR --input-file=%t.cir %s +// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fclangir -emit-llvm %s -o %t-cir.ll +// RUN: FileCheck --check-prefix=LLVM --input-file=%t-cir.ll %s +// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm %s -o %t.ll +// RUN: FileCheck --check-prefix=LLVM --input-file=%t.ll %s + +char high_bytes[] = "\x80\xff\x7f"; + +// CIR: cir.global external @high_bytes = #cir.const_array<"\80\FF\7F" : !cir.array<!s8i x 3>, trailing_zeros> : !cir.array<!s8i x 4> +// LLVM: @high_bytes = global [4 x i8] c"\80\FF\7F\00" + +unsigned char ubytes[4] = {0x80, 0xff, 0x01, 0x7f}; + +// CIR: cir.global external @ubytes = #cir.const_array<[#cir.int<128> : !u8i, #cir.int<255> : !u8i, #cir.int<1> : !u8i, #cir.int<127> : !u8i]> : !cir.array<!u8i x 4> +// LLVM: @ubytes = global [4 x i8] c"\80\FF\01\7F" _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
