Hi Mark, Mark H Weaver <m...@netris.org> writes:
> Unfortunately, the alternatives are not pleasant. We have a bunch of > bugs in our string handling functions. Currently, our case-insensitive > string comparisons and case conversions are not correct for several > languages including German, according to the R6RS among other things. > > We could easily fix these problems by using libunistring, which provides > the operations we need, but only if we use a single string > representation, and one that is supported by libunistring (UTF-8, > UTF-16, or UTF-32). I don’t think so. For instance, you could “upgrade” narrow strings to UTF-32 and then use libunistring on that. That would fix case-folding for “Straße”, I guess. > So, our options appear to be: > > * Use only wide strings internally. > > * Reimplement several complex functions from libunistring within guile > (string comparisons and case conversions). > > * Convert strings to a libunistring-supported representation, and > possibly back again, on each operation. For example, this will be > needed when comparing two narrow strings, when comparing a narrow > string to a wide string, or when applying a case conversion to a > narrow string. > > Our use of two different internal string representations is another > problem. Right now, our string comparisons are painfully inefficient. Inefficient in the (unlikely) case that you’re comparing a narrow and a wide string of different lengths. So yes, the current implementation has bugs, but I think most if not all can be fixed with minimal changes. Would you like to look into it for 2.0.x? Using UTF-8 internally has problems of its own, as Mike explained, which is why it was rejected in the first place. Thanks, Ludo’.