On Monday, 15 October 2018 at 18:57:19 UTC, Vinay Sajip wrote:
On Monday, 15 October 2018 at 17:55:34 UTC, Dukc wrote:
This is done automatically for character arrays, which
includes strings. wchar arrays wil iterate by UTF-16, and
dchar arrays by UTF-32. If you have a byte/ubyte array you
know to be unicode-encoded, convert it to char[] to iterate by
code points.
Thanks for the response. I was looking for something where I
don't have to manage buffers myself (e.g. when handling
buffered file or socket I/O). It's really easy to find this
functionality in e.g. Python, C#, Go, Kotlin, Java etc. but I'm
surprised there doesn't seem to be a ready-to-go equivalent in
D. For example, I can find D examples of opening files and
reading a line at a time, but no examples of opening a file and
reading Unicode chars one at a time. Perhaps I've just missed
them?
import std.file : readText;
import std.uni : byCodePoint, byGrapheme;
// or import std.utf : byCodeUnit, byChar /*utf8*/, byWchar
/*utf16*/, byDchar /*utf32*/, byUTF /*utf8(?)*/;
string a = readText("foo");
foreach(cp; a.byCodePoint)
{
// do stuff with code point 'cp'
}
foreach(g; a.byGrapheme)
{
// do stuff with grapheme 'g'
}