I totally agree with all of that.

It's one of those cases where correct by default is far too slow (that would have to be graphemes) but fast by default is far too broken. Better to force an explicit choice.

There is no magic bullet for unicode in a systems language such as D. The programmer must be aware of it and make choices about how to treat it.

I see didn't know about difference between byCodeUnit and
byGrapheme, because I speak Russian and it is close to English,
because it doesn't have diacritics. As far as I remember German,
that I learned at school have diacritics. So you opened my eyes
in this question. My position as usual programmer is that I
speaking language which graphemes coded by 2 bytes and I alwas
need to do decoding otherwise my programme will be broken. Other
possibility is to use wstring or dstring, but it is less memory
efficient. Also UTF-8 is more commonly used in the Internet so I
don't want to do some conversions to UTF-32, for example.

Where I could read about byGrapheme? Isn't this approach
overcomplicated? I don't want to write Dostoevskiy's book "War
and Peace" in order to write some parser for simple DSL.

Reply via email to