Re: Why UTF-8/16 character encodings?

Walter Bright Fri, 24 May 2013 18:50:26 -0700

On 5/24/2013 1:37 PM, Joakim wrote:

This leads to Phobos converting every UTF-8 string to UTF-32, so that
it can easily run its algorithms on a constant-width 32-bit character set, and
the resulting performance penalties.

This is more a problem with the algorithms taking the easy way than a problemwith UTF-8. You can do all the string algorithms, including regex, by workingwith the UTF-8 directly rather than converting to UTF-32. Then the algorithmswork at full speed.

> Yes, it wouldn't be strictly backwards-compatible with ASCII, but it would beso much easier to internationalize.

That was the go-to solution in the 1980's, they were called "code pages". Adisaster.

> with the few exceptional languages with more than 256 characters encoded intwo bytes.

Like those rare languages Japanese, Korean, Chinese, etc. This too was done inthe 80's with "Shift-JIS" for Japanese, and some other wacky scheme for Korean,and a third nutburger one for Chinese.

I've had the misfortune of supporting all that in the old Zortech C++ compiler.It's AWFUL. If you think it's simpler, all I can say is you've never tried towrite internationalized code with it.

UTF-8 is heavenly in comparison. Your code is automatically internationalized.It's awesome.

Re: Why UTF-8/16 character encodings?

Reply via email to