On Sunday, 24 August 2014 at 18:43:36 UTC, Dmitry Olshansky wrote:
24-Aug-2014 22:19, Andrew Godfrey пишет:
The OP and the question of auto-decoding share the same root problem: Even though D does a lot better with UTF than other languages I've used, it still confuses characters with code points somewhat. "Element type is
some character" is an example from OP. So clarify for me:
If a programmer makes an array of either 'char' or 'wchar', does that
always, unambiguously, mean a UTF8 or UTF16 code point?

Yes, pedantically - UTF-8 and UTF-16 code _units_. dchar is a codepoint.

E.g. If
interoperating with C code, they will never make the mistake of using
these types for a non-string byte/word array?


char != byte, and compiler will reject pointer and array assignments of byte* to char*, ubyte[] to char[] etc. Values themselves are convertible, so would work with implicit conversion.

If and only if this is true, then D has done well and I'm unafraid of
duck-typing here.

Both your answers are at the level of the compiler/language spec.
Relevant yes, but not complete. E.g. How often will people manually converting a .h file, convert C "const char *" correctly to either something char-based or something ubyte-based, depending on whether it represents utf-8 code points?
How often will they even know?
With wchar it's probably even worse, because of API's that
use one type but depending on other parameters,
the string elements can be utf-16 code points or
glyph indices.

Reply via email to