Re: Wide strings

Ludovic Courtès Sun, 25 Jan 2009 14:34:14 -0800

Hello!

Mike Gran <spk...@yahoo.com> writes:


> Hi.  I know there has been a lot of talk about wide characters and
> Unicode over the years.  I'd like to see it happen because how the are
> implemented will determine the future of a couple of my side-projects.
> I could pitch in, if you needed some help.

Indeed, it looks like you have some experience with GuCu!  ;-)

I agree it would be really nice to have Unicode support, but I'm not
aware of any "plan", so please go ahead!  :-)

A few considerations regarding the inevitable debate about the internal
string representation:

  1. IMO it'd be nice to have ASCII strings special-cased so that they
     are always encoded in ASCII.  This would allow for memory savings
     since, e.g., most symbols are expected to contain only ASCII
     characters.  It might also simplify interaction with C in certain
     cases; for instance, it would make it easy to have statically
     initialized ASCII Scheme strings [0].

  2. O(1) `string-{ref,set!}' is somewhat mandated by parts of SRFI-13.
     For instance, `substring' takes indices as parameters,
     `string-index' returns an index, etc. (John Cowan once argued that
     an abstract type to represent the position would remove this
     limitation [1], but the fact is that we have to live with SRFI-13).

  3. GLib et al. like UTF-8, and it'd be nice to minimize the overhead
     when interfacing with these libs (e.g., by avoiding translations
     from one string representation to another).

  4. It might be nice to be friendly to `wchar_t' and friends.

Interestingly, some of these things are contradictory.

Will Clinger has a good summary of a range of possible implementations:

  https://trac.ccs.neu.edu/trac/larceny/wiki/StringRepresentations

Thanks,
Ludo'.

[0] http://thread.gmane.org/gmane.lisp.guile.devel/7998
[1] http://lists.r6rs.org/pipermail/r6rs-discuss/2007-April/002252.html

Re: Wide strings

Reply via email to