Re: Using decodeFront with a generalised input range

Vinay Sajip via Digitalmars-d-learn Fri, 09 Nov 2018 04:25:37 -0800

On Friday, 9 November 2018 at 11:24:42 UTC, Jonathan M Daviswrote:

decode and decodeFront are for converting a UTF code unit to aUnicode code point. So, you're taking UTF-8 code unit (char),UTF-16 code unit (wchar), or a UTF-32 code unit (dchar) anddecoding it. In the case of UTF-32, that's a no-op, sinceUTF-32 code units are already code points, but for UTF-8 andUTF-16, they're not the same at all.

I would advise against doing much with decode or decodeFrontwithout having a decent understanding of the basics of Unicode.

I think I understand enough of the basics of Unicode, at leastfor my application; my unfamiliarity is with the D language andstandard library, to which I am very new.

There are applications where one needs to decode a stream ofbytes into Unicode text: perhaps it's just semantic quibblingdistinguishing between "a ubyte" and "a UTF-8 code unit", asthey're the same at the level of bits and bytes (as I understandit - please tell me if you think otherwise). If I open a fileusing mode "rb", I get a sequence of bytes, which may containstructured binary data, parts of which are to be interpreted astext encoded in UTF-8. Is there something in the D standardlibrary which enables incremental decoding of such (parts of) abyte stream? Or does one have to resort to the `map!(x =>cast(char) x)` method for this?

Re: Using decodeFront with a generalised input range

Reply via email to