On Mon, Mar 17, 2008 at 11:22 AM, Kon Lovett <[EMAIL PROTECTED]> wrote: > Summary: I want a byte-string API. I want string integrations. I want > global UTF8 strings.
The Factor language borrowed from Larceny a clever mechanism for representing Unicode strings efficiently. Perhaps such a system is feasible for Chicken, and might eliminate some of these issues (at the cost of distancing our string type a bit more from C char arrays): http://factor-language.blogspot.com/2008_01_01_archive.html "The new representation is quite clever, and comes from Larceny Scheme. The idea is that strings are ASCII strings, but have an extra slot pointing to an 'auxiliary vector'. If no auxiliary vector is set, the nth character of the string is just the nth byte. If an auxiliary vector is set, then the nth character has the nth byte as the least significant 8 bits, and the most significant 13 bits come from the nth double-byte in the auxiliary vector. Storing a non-ASCII character into the string creates an auxiliary vector if necessary. This reduces space usage for ASCII strings, it can represent every Unicode code point, and for strings with high code points in them, it still uses less space than the other alternative, UTF-32." So, a byte string would simply be a string with a null auxilliary vector. Graham _______________________________________________ Chicken-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/chicken-users
