https://llvm.org/bugs/show_bug.cgi?id=24342

            Bug ID: 24342
           Summary: std::char_traits<char16_t>::eof() returns valid code
                    unit
           Product: libc++
           Version: 3.6
          Hardware: Macintosh
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: All Bugs
          Assignee: unassignedclangb...@nondot.org
          Reporter: david_w...@me.com
                CC: llvmbugs@cs.uiuc.edu, mclow.li...@gmail.com
    Classification: Unclassified

[char.traits.specializations.char16_t] ยง21.2.3.2/3 says,

"The member eof() shall return an implementation-defined constant that cannot
appear as a valid UTF-16 code unit."

In libc++ it returns 0xDFFF, which is a valid second half of a surrogate pair.
Surrogate pairs are only needed outside the basic multilingual plane, so it
won't often be seen, but characters like U+123FF are valid and encoded by
0xDFFF.

On the other hand, U+FFFF is a "noncharacter," "intended for process-internal
uses" similarly to the byte order mark (which happens to be the preceding code
point U+FFFE). (http://unicode.org/charts/PDF/UFFF0.pdf) U+FFFF is used by most
other environments, it is the value under libstdc++, and it coincides with WEOF
when wchar_t is UTF-16.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
LLVMbugs mailing list
LLVMbugs@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmbugs

Reply via email to