Re: Wide strings

Ludovic Courtès Mon, 26 Jan 2009 14:03:08 -0800

Hello!

Neil Jerram <neiljer...@googlemail.com> writes:


> But what about the other possible debate, about the API?  Are you
> thinking that we should accept R6RS's choice?

No, I think we have SRFI-1[34] to start with, both of which are well
defined in the context of Unicode.

> (I really haven't read up on all this enough - however when reading
> Tom Lord's analysis just now, I was thinking "why not just specify
> that things like char-upcase don't work in the difficult cases", and
> it seems to me that this is what R6RS chose to do.  So at first glance
> the R6RS API looks OK to me.

Regarding `ß' (German eszet), which is one of the "difficult cases"
mentioned by Tom Lord, SRFI-13 reads:

  Some characters case-map to more than one character.  For example, the
  Latin-1 German eszet character upper-cases to "SS."

    * This means that the R5RS function char-upcase is not well-defined,
      since it is defined to produce a (single) character result.

    * It means that an in-place string-upcase! procedure cannot be
      reliably defined, since the original string may not be long enough
      to contain the result -- an N-character string might upcase to a
      2N-character result.

    * It means that case-insensitive string-matching or searching is
      quite tricky. For example, an n-character string s might match a
      2N-character string s'.

And then:

  SRFI 13 makes no attempt to deal with these issues; it uses a simple
  1-1 locale- and context-independent case-mapping

I think it's reasonable to stick to this approach at first, at least.
Locale-dependent case folding is part of `(ice-9 i18n)' anyway.

Thanks,
Ludo'.

Re: Wide strings

Reply via email to