Re: Using libunistring for string comparisons et al

Mark H Weaver Tue, 15 Mar 2011 10:21:22 -0700

Mike Gran <spk...@yahoo.com> writes:
> We do, in a matter of speaking, have a single string representation:
> UTF-32.  The 'narrow' encoding is UTF-32 with the initial 3 bytes of
> zero removed.


Despite the similarity of these two representations, they are
sufficiently different that they cannot be handled by the same machine
code.  That means you must either implement multiple inner loops, one
for each combination of string parameter representations, or else you
must dispatch on the string representation within the inner loop.  On
modern architectures, wrongly predicted conditional branches are very
expensive.

> I actually at one point had a nearly complete version of Guile 1.8
> that used UTF-8 and another that used UTF-32.  There are some
> other reasons why UTF-8 is bad, which I could bore you with
> ad naseum.

Can you please tell me why UTF-8 is bad, or point me to something that
explains it?  Everything I have found suggests that UTF-8 is very good.

   Thanks,
     Mark

Re: Using libunistring for string comparisons et al

Reply via email to