On Mon, Mar 17, 2008 at 10:29 PM, Alex Shinn <[EMAIL PROTECTED]> wrote: > >>>>> "Graham" == Graham Fawcett <[EMAIL PROTECTED]> writes: > > Graham> On Mon, Mar 17, 2008 at 11:22 AM, Kon Lovett <[EMAIL PROTECTED]> > wrote: > > Graham> The Factor language borrowed from Larceny a > Graham> clever mechanism for representing Unicode > Graham> strings efficiently. Perhaps such a system is > Graham> feasible for Chicken, and might eliminate some > Graham> of these issues (at the cost of distancing our > Graham> string type a bit more from C char arrays): [snip] > This only adds news issues, and solves none of the old ones. > The representation itself is interesting, though it may in > fact be a pessimisation in many cases (utf8 is about the > fastest approach for parsing and regex matching, which are > the string operations where speed is the biggest issue to > begin with).
Fair enough. Here's another thought. It seems to me that if we were to represent strings as composite values, e.g. a two-slot record whose first slot is an encoding (the symbol 'utf8, or #f for 'byte' encoding), and whose second slot contains the string data, then the various string functions could dispatch on the type, and there would be no need to monkey-patch core string functions to get the desired semantics. A proper protocol for handling string encodings could be designed, utf8 being one of those encodings. I don't imagine the dispatch overhead would be significant in any but the tightest inner loops, in which case one could resort to fully-specified functions (e.g. byte-string-length or utf8-string-length). Graham _______________________________________________ Chicken-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/chicken-users
