[PATCH] D125059: [Lex] Don't assert when decoding invalid UCNs.

Sam McCall via Phabricator via cfe-commits Thu, 05 May 2022 16:45:07 -0700

sammccall created this revision.
sammccall added a reviewer: hokein.
Herald added a project: All.
sammccall requested review of this revision.
Herald added projects: clang, clang-tools-extra.
Herald added a subscriber: cfe-commits.


Currently if a lexically-valid UCN encodes an invalid codepoint, then we
diagnose that, and then hit an assertion while trying to decode it.

Since there isn't anything preventing us reaching this state, remove the
assertion. expandUCNs("X\UAAAAAAAAY") will produce "XY".


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D125059

Files:
  clang-tools-extra/pseudo/test/crash/bad-ucn.c
  clang/lib/Lex/LiteralSupport.cpp
  clang/test/Lexer/unicode.c


Index: clang/test/Lexer/unicode.c
===================================================================
--- clang/test/Lexer/unicode.c
+++ clang/test/Lexer/unicode.c
@@ -28,6 +28,9 @@
 
         int _;
 
+extern int X\UAAAAAAAA; // expected-error {{not allowed in an identifier}}
+int Y = '\UAAAAAAAA'; // expected-error {{invalid universal character}}
+
 #ifdef __cplusplus
 
 extern int ༀ;
Index: clang/lib/Lex/LiteralSupport.cpp
===================================================================
--- clang/lib/Lex/LiteralSupport.cpp
+++ clang/lib/Lex/LiteralSupport.cpp
@@ -320,10 +320,8 @@
                             llvm::SmallVectorImpl<char> &Str) {
   char ResultBuf[4];
   char *ResultPtr = ResultBuf;
-  bool Res = llvm::ConvertCodePointToUTF8(Codepoint, ResultPtr);
-  (void)Res;
-  assert(Res && "Unexpected conversion failure");
-  Str.append(ResultBuf, ResultPtr);
+  if (llvm::ConvertCodePointToUTF8(Codepoint, ResultPtr))
+    Str.append(ResultBuf, ResultPtr);
 }
 
 void clang::expandUCNs(SmallVectorImpl<char> &Buf, StringRef Input) {
Index: clang-tools-extra/pseudo/test/crash/bad-ucn.c
===================================================================
--- /dev/null
+++ clang-tools-extra/pseudo/test/crash/bad-ucn.c
@@ -0,0 +1,4 @@
+// This UCN doesn't encode a valid codepoint.
+// We used to assert while trying to expand UCNs in the token.
+// RUN: clang-pseudo -source=%s
+A\UAAAAAAAA

Index: clang/test/Lexer/unicode.c
===================================================================
--- clang/test/Lexer/unicode.c
+++ clang/test/Lexer/unicode.c
@@ -28,6 +28,9 @@
 
         int _;
 
+extern int X\UAAAAAAAA; // expected-error {{not allowed in an identifier}}
+int Y = '\UAAAAAAAA'; // expected-error {{invalid universal character}}
+
 #ifdef __cplusplus
 
 extern int à¼;
Index: clang/lib/Lex/LiteralSupport.cpp
===================================================================
--- clang/lib/Lex/LiteralSupport.cpp
+++ clang/lib/Lex/LiteralSupport.cpp
@@ -320,10 +320,8 @@
                             llvm::SmallVectorImpl<char> &Str) {
   char ResultBuf[4];
   char *ResultPtr = ResultBuf;
-  bool Res = llvm::ConvertCodePointToUTF8(Codepoint, ResultPtr);
-  (void)Res;
-  assert(Res && "Unexpected conversion failure");
-  Str.append(ResultBuf, ResultPtr);
+  if (llvm::ConvertCodePointToUTF8(Codepoint, ResultPtr))
+    Str.append(ResultBuf, ResultPtr);
 }
 
 void clang::expandUCNs(SmallVectorImpl<char> &Buf, StringRef Input) {
Index: clang-tools-extra/pseudo/test/crash/bad-ucn.c
===================================================================
--- /dev/null
+++ clang-tools-extra/pseudo/test/crash/bad-ucn.c
@@ -0,0 +1,4 @@
+// This UCN doesn't encode a valid codepoint.
+// We used to assert while trying to expand UCNs in the token.
+// RUN: clang-pseudo -source=%s
+A\UAAAAAAAA

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D125059: [Lex] Don't assert when decoding invalid UCNs.

Reply via email to