On 10/27/22 13:16, Ben Boeckel wrote:
Unicode does not support such values because they are unrepresentable in
UTF-16.

Signed-off-by: Ben Boeckel <ben.boec...@kitware.com>
---
  libcpp/ChangeLog  | 6 ++++++
  libcpp/charset.cc | 4 ++--
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index 18d5bcceaf0..4d707277531 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,9 @@
+2022-10-27  Ben Boeckel  <ben.boec...@kitware.com>
+
+       * include/charset.cc: Reject encodings of codepoints above 0x10FFFF.
+       UTF-16 does not support such codepoints and therefore all Unicode
+       rejects such values.
+
  2022-10-19  Lewis Hyatt  <lhy...@gmail.com>
* include/cpplib.h (struct cpp_string): Use new "string_length" GTY.
diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 12a398e7527..e9da6674b5f 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -216,7 +216,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t 
*inbytesleftp,
    if (c <= 0x3FFFFFF && nbytes > 5) return EILSEQ;
/* Make sure the character is valid. */
-  if (c > 0x7FFFFFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
+  if (c > 0x10FFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;

Please also adjust the comment before the function that talks about the 0x7FFFFFFF maximum.

*cp = c;
    *inbufp = inbuf;
@@ -320,7 +320,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, 
size_t *inbytesleftp,
    s += inbuf[bigend ? 2 : 1] << 8;
    s += inbuf[bigend ? 3 : 0];
- if (s >= 0x7FFFFFFF || (s >= 0xD800 && s <= 0xDFFF))
+  if (s > 0x10FFFF || (s >= 0xD800 && s <= 0xDFFF))
      return EILSEQ;
rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);

Reply via email to