Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

Tobia Conforto Tue, 18 Mar 2008 10:57:25 -0700

John Cowan wrote:

The difference between restricted and unrestricted strings may notbe as large as the distinction between pairs and fixnums, but it'sthe same *kind* of difference.


I beg to differ.

A pair is no fixnum, and vice-versa.  They're two disjoint domains.

On the other hand, an UTF-8 string is at the same time both a sequenceof Unicode objects and a sequence of bytes, and in many circumstancesit must be treated as both during its life-span (for example usingUnicode-aware operations to compose it and then byte-operations tosplit it into network packets, or to compute an MD5 digest from it,etc.)

This discussion has convinced me that from a *practical* point ofview, it makes a lot of sense to use the same underlying object forboth kinds of operation, instead of copying over the contents everytime you want to switch between the two views (as I suppose it happensfor example in Java, with strings and byte arrays.)

Having the string API operate on UTF-8 characters and having a new APIto operate on bytes, *both on the same underlying string objects*,will let us have the cake and eat it too, at the expense of changingthe meaning of the string API for all existing applications.

The dynamic nature of Scheme suggests that it will all workseamlessly, until someone tries to call a (now Unicode-aware) string-length on a string whose UTF-8 structure had been corrupted with byte-level operations. At which point a runtime error will kindly signalthe situation ;-)



Tobia


_______________________________________________
Chicken-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

Reply via email to