On Friday, 9 November 2018 at 11:24:42 UTC, Jonathan M Davis wrote:
decode and decodeFront are for converting a UTF code unit to a Unicode code point. So, you're taking UTF-8 code unit (char), UTF-16 code unit (wchar), or a UTF-32 code unit (dchar) and decoding it. In the case of UTF-32, that's a no-op, since UTF-32 code units are already code points, but for UTF-8 and UTF-16, they're not the same at all.

I would advise against doing much with decode or decodeFront without having a decent understanding of the basics of Unicode.


I think I understand enough of the basics of Unicode, at least for my application; my unfamiliarity is with the D language and standard library, to which I am very new.

There are applications where one needs to decode a stream of bytes into Unicode text: perhaps it's just semantic quibbling distinguishing between "a ubyte" and "a UTF-8 code unit", as they're the same at the level of bits and bytes (as I understand it - please tell me if you think otherwise). If I open a file using mode "rb", I get a sequence of bytes, which may contain structured binary data, parts of which are to be interpreted as text encoded in UTF-8. Is there something in the D standard library which enables incremental decoding of such (parts of) a byte stream? Or does one have to resort to the `map!(x => cast(char) x)` method for this?

Reply via email to