Re: Why UTF-8/16 character encodings?

Diggory Sat, 25 May 2013 00:50:37 -0700

I think you are a little confused about what unicode actuallyis... Unicode has nothing to do with code pages and nobody usescode pages any more except for compatibility with legacyapplications (with good reason!).


Unicode is:
1) A standardised numbering of a large number of characters

2) A set of standardised algorithms for operating on thesecharacters3) A set of standardised encodings for efficiently encodingsequences of these characters

You said that phobos converts UTF-8 strings to UTF-32 beforeoperating on them but that's not true. As it iterates over UTF-8strings it iterates over dchars rather than chars, but that's notin any way inefficient so I don't really see the problem.

Also your complaint that UTF-8 reserves the short characters forthe english alphabet is not really relevant - the characters withlonger encodings tend to be rarer (such as special symbols) orcarry more information (such as chinese characters where the samesentence takes only about 1/3 the number of characters).

Re: Why UTF-8/16 character encodings?

Reply via email to