I think you are a little confused about what unicode actually is... Unicode has nothing to do with code pages and nobody uses code pages any more except for compatibility with legacy applications (with good reason!).

Unicode is:
1) A standardised numbering of a large number of characters
2) A set of standardised algorithms for operating on these characters 3) A set of standardised encodings for efficiently encoding sequences of these characters

You said that phobos converts UTF-8 strings to UTF-32 before operating on them but that's not true. As it iterates over UTF-8 strings it iterates over dchars rather than chars, but that's not in any way inefficient so I don't really see the problem.

Also your complaint that UTF-8 reserves the short characters for the english alphabet is not really relevant - the characters with longer encodings tend to be rarer (such as special symbols) or carry more information (such as chinese characters where the same sentence takes only about 1/3 the number of characters).

Reply via email to