Re: Using libunistring for string comparisons et al

Ludovic Courtès Sun, 13 Mar 2011 14:31:11 -0700

Hi Mark,

Mark H Weaver <m...@netris.org> writes:


> Unfortunately, the alternatives are not pleasant.  We have a bunch of
> bugs in our string handling functions.  Currently, our case-insensitive
> string comparisons and case conversions are not correct for several
> languages including German, according to the R6RS among other things.
>
> We could easily fix these problems by using libunistring, which provides
> the operations we need, but only if we use a single string
> representation, and one that is supported by libunistring (UTF-8,
> UTF-16, or UTF-32).

I don’t think so.  For instance, you could “upgrade” narrow strings to
UTF-32 and then use libunistring on that.  That would fix case-folding
for “Straße”, I guess.

> So, our options appear to be:
>
>   * Use only wide strings internally.
>
>   * Reimplement several complex functions from libunistring within guile
>     (string comparisons and case conversions).
>
>   * Convert strings to a libunistring-supported representation, and
>     possibly back again, on each operation.  For example, this will be
>     needed when comparing two narrow strings, when comparing a narrow
>     string to a wide string, or when applying a case conversion to a
>     narrow string.
>
> Our use of two different internal string representations is another
> problem.  Right now, our string comparisons are painfully inefficient.

Inefficient in the (unlikely) case that you’re comparing a narrow and a
wide string of different lengths.

So yes, the current implementation has bugs, but I think most if not all
can be fixed with minimal changes.  Would you like to look into it
for 2.0.x?

Using UTF-8 internally has problems of its own, as Mike explained, which
is why it was rejected in the first place.

Thanks,
Ludo’.

Re: Using libunistring for string comparisons et al

Reply via email to