Re: First Impressions!

Patrick Schluter via Digitalmars-d Thu, 30 Nov 2017 10:20:51 -0800

On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Daviswrote:

English and thus don't as easily hit the cases where their codeis wrong. For better or worse, UTF-16 hides it better thanUTF-8, but the problem exists in both.

To give just an example of what can go wrong with UTF-16. Readinga file in UTF-16 and converting it tosomething else like UTF-8 orUTF-32. Reading block by block and hitting exactly a SMPcodepoint at the buffer limit, high surrogate at the end of thefirst buffer, low surrogate at the start of the next. If youdon't think about it => 2 invalid characters instead of your nicepoop 💩 emoji character (emojis are in the SMP and they are moreand more frequent).

Re: First Impressions!

Reply via email to