Mike Gran <spk...@yahoo.com> writes: > We do, in a matter of speaking, have a single string representation: > UTF-32. The 'narrow' encoding is UTF-32 with the initial 3 bytes of > zero removed.
Despite the similarity of these two representations, they are sufficiently different that they cannot be handled by the same machine code. That means you must either implement multiple inner loops, one for each combination of string parameter representations, or else you must dispatch on the string representation within the inner loop. On modern architectures, wrongly predicted conditional branches are very expensive. > I actually at one point had a nearly complete version of Guile 1.8 > that used UTF-8 and another that used UTF-32. There are some > other reasons why UTF-8 is bad, which I could bore you with > ad naseum. Can you please tell me why UTF-8 is bad, or point me to something that explains it? Everything I have found suggests that UTF-8 is very good. Thanks, Mark