On Mon, 25 Feb 2002 16:16:14 +0100 (CET) Bruno Haible <[EMAIL PROTECTED]> wrote:
> Michael B Allen writes: > > Do the str* functions handle strings differently if the locale is > > different? > > It depends on the functions. > > strcpy strncpy strcat strncat strcmp strncmp strdup strchr strrchr > strcspn strspn strpbrk strstr strtok: NO > > strcoll strxfrm: YES > > strcasecmp: YES but doesn't work in multibyte locales. > > > For example, does strcmp work on UTF-8 strings? > > Not well. Better use strcoll. What's the ultimate goal here? Are any of these functions *supposed* to work on multi-byte characters, or will there be mbs* functions? I want my code to work in any locale. How am I supposed to manipluate multi-byte strings? For example, shouldn't there at least be a function like this: /* * Returns a pointer to the character at off withing the multi-byte string * src not examining more than sn bytes. */ char * mbsnoff(char *src, int off, size_t sn) { unsigned long ucs; int w; size_t n; mbstate_t ps; ucs = 1; memset(&ps, 0, sizeof(ps)); if (sn > INT_MAX) { sn = INT_MAX; } if (off < 0) { off = INT_MAX; } while (ucs && (n = mbrtowc(&ucs, src, sn, &ps)) != (size_t)-2) { if (n == (size_t)-1) { return NULL; } if ((w = wcwidth(ucs)) > 0) { if (w > off) { break; } off -= w; } sn -= n; src += n; } return src; } DISCLAIMER: I just wrote this last night, it hasn't been tested AT ALL. BTW: The RH 7.2 strchr manpage still reads 'character'. Mike -- May The Source be with you. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/