Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

John Cowan Tue, 18 Mar 2008 13:53:34 -0700

Tobia Conforto scripsit:

> Let's see... ASCII is valid UTF-8, so all ASCII external  
> representations wouldn't need any encoding or decoding work.


True.  However, pure ASCII is less comment than people believe, as
indicated by the 59K Google hits for "8-bit ASCII".

> Most recent formats and protocols require or strongly recommend UTF-8
> (see XML etc.) so those wouldn't need any encoding/decoding either.

Well, there's an awful lot of content on the Internet and on local hard
disks that is neither true ASCII nor UTF-8.  In particular, UTF-16 is
the usual representation of Unicode on Windows, and various non-Unicode
character sets are the usual representation of text on Windows, and
consequently on the Web too.  UTF-8 is something of an oddity there.

> As far as internal representations covering all Unicode go, UTF-8
> looks like the one incurring in the less overhead, in the general case.
> Not to mention the less work on the developer side, as we already have
> the utf8 egg!

I'm fine with using UTF-8 as our internal representation.

> Unicode/UTF8-aware string operations will perform a correct  
> replacement and insert the two extra bytes, if the source string  
> really is plain ASCII.  If the source string (or just the part near  
> the change) is not correct UTF-8 or ASCII to begin with, they will  
> raise an error.

You're right.

-- 
Overhead, without any fuss, the stars were going out.
        --Arthur C. Clarke, "The Nine Billion Names of God"
                John Cowan <[EMAIL PROTECTED]>


_______________________________________________
Chicken-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

Reply via email to