Tobia Conforto scripsit:
> Let's see... ASCII is valid UTF-8, so all ASCII external
> representations wouldn't need any encoding or decoding work.
True. However, pure ASCII is less comment than people believe, as
indicated by the 59K Google hits for "8-bit ASCII".
> Most recent formats and protocols require or strongly recommend UTF-8
> (see XML etc.) so those wouldn't need any encoding/decoding either.
Well, there's an awful lot of content on the Internet and on local hard
disks that is neither true ASCII nor UTF-8. In particular, UTF-16 is
the usual representation of Unicode on Windows, and various non-Unicode
character sets are the usual representation of text on Windows, and
consequently on the Web too. UTF-8 is something of an oddity there.
> As far as internal representations covering all Unicode go, UTF-8
> looks like the one incurring in the less overhead, in the general case.
> Not to mention the less work on the developer side, as we already have
> the utf8 egg!
I'm fine with using UTF-8 as our internal representation.
> Unicode/UTF8-aware string operations will perform a correct
> replacement and insert the two extra bytes, if the source string
> really is plain ASCII. If the source string (or just the part near
> the change) is not correct UTF-8 or ASCII to begin with, they will
> raise an error.
You're right.
--
Overhead, without any fuss, the stars were going out.
--Arthur C. Clarke, "The Nine Billion Names of God"
John Cowan <[EMAIL PROTECTED]>
_______________________________________________
Chicken-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/chicken-users