John Cowan wrote:
The difference between restricted and unrestricted strings may not
be as large as the distinction between pairs and fixnums, but it's
the same *kind* of difference.
I beg to differ.
A pair is no fixnum, and vice-versa. They're two disjoint domains.
On the other hand, an UTF-8 string is at the same time both a sequence
of Unicode objects and a sequence of bytes, and in many circumstances
it must be treated as both during its life-span (for example using
Unicode-aware operations to compose it and then byte-operations to
split it into network packets, or to compute an MD5 digest from it,
etc.)
This discussion has convinced me that from a *practical* point of
view, it makes a lot of sense to use the same underlying object for
both kinds of operation, instead of copying over the contents every
time you want to switch between the two views (as I suppose it happens
for example in Java, with strings and byte arrays.)
Having the string API operate on UTF-8 characters and having a new API
to operate on bytes, *both on the same underlying string objects*,
will let us have the cake and eat it too, at the expense of changing
the meaning of the string API for all existing applications.
The dynamic nature of Scheme suggests that it will all work
seamlessly, until someone tries to call a (now Unicode-aware) string-
length on a string whose UTF-8 structure had been corrupted with byte-
level operations. At which point a runtime error will kindly signal
the situation ;-)
Tobia
_______________________________________________
Chicken-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/chicken-users