On Mon, Mar 17, 2008 at 11:22 AM, Kon Lovett <[EMAIL PROTECTED]> wrote:
> Summary: I want a byte-string API. I want string integrations. I want
>  global UTF8 strings.

The Factor language borrowed from Larceny a clever mechanism for
representing Unicode strings efficiently. Perhaps such a system is
feasible for Chicken, and might eliminate some of these issues (at the
cost of distancing our string type a bit more from C char arrays):

http://factor-language.blogspot.com/2008_01_01_archive.html

"The new representation is quite clever, and comes from Larceny
Scheme. The idea is that strings are ASCII strings, but have an extra
slot pointing to an 'auxiliary vector'. If no auxiliary vector is set,
the nth character of the string is just the nth byte. If an auxiliary
vector is set, then the nth character has the nth byte as the least
significant 8 bits, and the most significant 13 bits come from the nth
double-byte in the auxiliary vector. Storing a non-ASCII character
into the string creates an auxiliary vector if necessary. This reduces
space usage for ASCII strings, it can represent every Unicode code
point, and for strings with high code points in them, it still uses
less space than the other alternative, UTF-32."

So, a byte string would simply be a string with a null auxilliary vector.

Graham


_______________________________________________
Chicken-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/chicken-users

Reply via email to