Re: Why UTF-8/16 character encodings?

Walter Bright Sat, 25 May 2013 12:35:23 -0700

On 5/25/2013 5:43 AM, Andrei Alexandrescu wrote:

On 5/25/13 3:33 AM, Joakim wrote:

On Saturday, 25 May 2013 at 01:58:41 UTC, Walter Bright wrote:

This is more a problem with the algorithms taking the easy way than a
problem with UTF-8. You can do all the string algorithms, including
regex, by working with the UTF-8 directly rather than converting to
UTF-32. Then the algorithms work at full speed.

I call BS on this. There's no way working on a variable-width encoding
can be as "full speed" as a constant-width encoding. Perhaps you mean
that the slowdown is minimal, but I doubt that also.


You mentioned this a couple of times, and I wonder what makes you so sure. On
contemporary architectures small is fast and large is slow; betting on replacing
larger data with more computation is quite often a win.

On the other hand, Joakim even admits his single byte encoding is variablelength, as otherwise he simply dismisses the rarely used (!) Chinese, Japanese,and Korean languages, as well as any text that contains words from more than onelanguage.


I suspect he's trolling us, and quite successfully.

Re: Why UTF-8/16 character encodings?

Reply via email to