On Saturday, 7 November 2020 at 16:12:06 UTC, Per Nordlöw wrote:

CtoLexer_parser.d 665 57 error invalid UTF character \U0000d800 CtoLexer_parser.d 665 67 error invalid UTF character \U0000dbff CtoLexer_parser.d 666 28 error invalid UTF character \U0000d800 CtoLexer_parser.d 666 38 error invalid UTF character \U0000dbff CtoLexer_parser.d 666 53 error invalid UTF character \U0000dc00 CtoLexer_parser.d 666 63 error invalid UTF character \U0000dfff

Doesn't DMD support these Unicodes yet?

They're not valid:

"The Unicode standard permanently reserves these code point values for UTF-16 encoding of the high and low surrogates, and they will never be assigned a character, so there should be no reason to encode them. The official Unicode standard says that no UTF forms, including UTF-16, can encode these code points" [1].

"... the standard states that such arrangements should be treated as encoding errors" [1].

Perhaps they need to be combined with other code points to form a valid character.

[1] https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF

--
/Jacob Carlborg


Reply via email to