https://llvm.org/bugs/show_bug.cgi?id=24342
Bug ID: 24342 Summary: std::char_traits<char16_t>::eof() returns valid code unit Product: libc++ Version: 3.6 Hardware: Macintosh OS: All Status: NEW Severity: normal Priority: P Component: All Bugs Assignee: unassignedclangb...@nondot.org Reporter: david_w...@me.com CC: llvmbugs@cs.uiuc.edu, mclow.li...@gmail.com Classification: Unclassified [char.traits.specializations.char16_t] ยง21.2.3.2/3 says, "The member eof() shall return an implementation-defined constant that cannot appear as a valid UTF-16 code unit." In libc++ it returns 0xDFFF, which is a valid second half of a surrogate pair. Surrogate pairs are only needed outside the basic multilingual plane, so it won't often be seen, but characters like U+123FF are valid and encoded by 0xDFFF. On the other hand, U+FFFF is a "noncharacter," "intended for process-internal uses" similarly to the byte order mark (which happens to be the preceding code point U+FFFE). (http://unicode.org/charts/PDF/UFFF0.pdf) U+FFFF is used by most other environments, it is the value under libstdc++, and it coincides with WEOF when wchar_t is UTF-16. -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ LLVMbugs mailing list LLVMbugs@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmbugs