On 5/24/2013 1:37 PM, Joakim wrote:
This leads to Phobos converting every UTF-8 string to UTF-32, so that
it can easily run its algorithms on a constant-width 32-bit character set, and
the resulting performance penalties.

This is more a problem with the algorithms taking the easy way than a problem with UTF-8. You can do all the string algorithms, including regex, by working with the UTF-8 directly rather than converting to UTF-32. Then the algorithms work at full speed.


> Yes, it wouldn't be strictly backwards-compatible with ASCII, but it would be so much easier to internationalize.

That was the go-to solution in the 1980's, they were called "code pages". A disaster.


> with the few exceptional languages with more than 256 characters encoded in two bytes.

Like those rare languages Japanese, Korean, Chinese, etc. This too was done in the 80's with "Shift-JIS" for Japanese, and some other wacky scheme for Korean, and a third nutburger one for Chinese.

I've had the misfortune of supporting all that in the old Zortech C++ compiler. It's AWFUL. If you think it's simpler, all I can say is you've never tried to write internationalized code with it.

UTF-8 is heavenly in comparison. Your code is automatically internationalized. It's awesome.

Reply via email to