Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

Graham Fawcett Mon, 17 Mar 2008 20:02:21 -0700

On Mon, Mar 17, 2008 at 10:29 PM, Alex Shinn <[EMAIL PROTECTED]> wrote:
> >>>>> "Graham" == Graham Fawcett <[EMAIL PROTECTED]> writes:
>
>     Graham> On Mon, Mar 17, 2008 at 11:22 AM, Kon Lovett <[EMAIL PROTECTED]> 
> wrote:
>
>     Graham> The Factor language borrowed from Larceny a
>     Graham> clever mechanism for representing Unicode
>     Graham> strings efficiently. Perhaps such a system is
>     Graham> feasible for Chicken, and might eliminate some
>     Graham> of these issues (at the cost of distancing our
>     Graham> string type a bit more from C char arrays):
[snip]
>  This only adds news issues, and solves none of the old ones.
>  The representation itself is interesting, though it may in
>  fact be a pessimisation in many cases (utf8 is about the
>  fastest approach for parsing and regex matching, which are
>  the string operations where speed is the biggest issue to
>  begin with).


Fair enough.

Here's another thought. It seems to me that if we were to represent
strings as composite values, e.g. a two-slot record whose first slot
is an encoding (the symbol 'utf8, or #f for 'byte' encoding), and
whose second slot contains the string data, then the various string
functions could dispatch on the type, and there would be no need to
monkey-patch core string functions to get the desired semantics. A
proper protocol for handling string encodings could be designed, utf8
being one of those encodings.

I don't imagine the dispatch overhead would be significant in any but
the tightest inner loops, in which case one could resort to
fully-specified functions (e.g. byte-string-length or
utf8-string-length).

Graham


_______________________________________________
Chicken-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

Reply via email to