Author: adams381
Date: 2026-06-11T21:54:19Z
New Revision: 0d490215a541cf91b8e24e697570bb1cc938078d

URL: 
https://github.com/llvm/llvm-project/commit/0d490215a541cf91b8e24e697570bb1cc938078d
DIFF: 
https://github.com/llvm/llvm-project/commit/0d490215a541cf91b8e24e697570bb1cc938078d.diff

LOG: [CIR] Lower string literals with high-bit bytes (#203384)

A string literal containing a byte >= 0x80 crashes CIR-to-LLVM lowering.
`convertStringAttrToDenseElementsAttr` builds each element's `APInt`
from a signed `char`, so a high-bit byte sign-extends to a 64-bit value
that no longer fits the 8-bit element width and trips the `APInt`
constructor assertion (`isUIntN(BitWidth, val) && "Value is not an N-bit
unsigned value"`).

Interpreting each string byte as `unsigned char` fixes it, mirroring
what #197269 did for scalar character literals. The string-literal array
path was the remaining site with the same defect, and the lowered LLVM
is byte-identical to classic CodeGen.

Repro: `char s[] = "\x80";` compiled with `-fclangir -emit-llvm`. This
also clears a cluster of SingleSource gcc-torture globals that embed
high-byte string data.

These globals compiled until #198427 removed the trailing-zeros
fast-path in the same lowering. String literals always carry a null
terminator (trailing zeros), so they previously took the insertvalue
path and never reached `convertStringAttrToDenseElementsAttr`; #198427
routes them through it and exposed this latent sign-extension bug.

Added: 
    clang/test/CIR/CodeGen/string-literal-high-bytes.c

Modified: 
    clang/lib/CIR/Lowering/LoweringHelpers.cpp

Removed: 
    


################################################################################
diff  --git a/clang/lib/CIR/Lowering/LoweringHelpers.cpp 
b/clang/lib/CIR/Lowering/LoweringHelpers.cpp
index 17cc583a37a1e..b903560ada18f 100644
--- a/clang/lib/CIR/Lowering/LoweringHelpers.cpp
+++ b/clang/lib/CIR/Lowering/LoweringHelpers.cpp
@@ -38,8 +38,11 @@ convertStringAttrToDenseElementsAttr(cir::ConstArrayAttr 
attr,
   llvm::SmallVector<mlir::APInt> values;
   values.reserve(totalSize);
 
+  // String bytes are raw values; interpret each as an unsigned byte so a
+  // high-bit char (>= 0x80) does not sign-extend to a value that overflows
+  // the element bit width when constructing the APInt.
   for (const char element : stringAttr)
-    values.emplace_back(bitWidth, element);
+    values.emplace_back(bitWidth, static_cast<unsigned char>(element));
 
   values.insert(values.end(), trailingZeros, mlir::APInt::getZero(bitWidth));
 

diff  --git a/clang/test/CIR/CodeGen/string-literal-high-bytes.c 
b/clang/test/CIR/CodeGen/string-literal-high-bytes.c
new file mode 100644
index 0000000000000..935a4abe42ada
--- /dev/null
+++ b/clang/test/CIR/CodeGen/string-literal-high-bytes.c
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fclangir -emit-cir %s -o 
%t.cir
+// RUN: FileCheck --check-prefix=CIR --input-file=%t.cir %s
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fclangir -emit-llvm %s -o 
%t-cir.ll
+// RUN: FileCheck --check-prefix=LLVM --input-file=%t-cir.ll %s
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm %s -o %t.ll
+// RUN: FileCheck --check-prefix=LLVM --input-file=%t.ll %s
+
+char high_bytes[] = "\x80\xff\x7f";
+
+// CIR: cir.global external @high_bytes = #cir.const_array<"\80\FF\7F" : 
!cir.array<!s8i x 3>, trailing_zeros> : !cir.array<!s8i x 4>
+// LLVM: @high_bytes = global [4 x i8] c"\80\FF\7F\00"
+
+unsigned char ubytes[4] = {0x80, 0xff, 0x01, 0x7f};
+
+// CIR: cir.global external @ubytes = #cir.const_array<[#cir.int<128> : !u8i, 
#cir.int<255> : !u8i, #cir.int<1> : !u8i, #cir.int<127> : !u8i]> : 
!cir.array<!u8i x 4>
+// LLVM: @ubytes = global [4 x i8] c"\80\FF\01\7F"


        
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to