On Mon, 25 Feb 2002 16:16:14 +0100 (CET)
Bruno Haible <[EMAIL PROTECTED]> wrote:
> Michael B Allen writes:
> > Do the str* functions handle strings differently if the locale is
> > different?
>
> It depends on the functions.
>
> strcpy strncpy strcat strncat strcmp strncmp strdup strchr strrchr
> strcspn strspn strpbrk strstr strtok: NO
>
> strcoll strxfrm: YES
>
> strcasecmp: YES but doesn't work in multibyte locales.
>
> > For example, does strcmp work on UTF-8 strings?
>
> Not well. Better use strcoll.
What's the ultimate goal here? Are any of these functions *supposed*
to work on multi-byte characters, or will there be mbs* functions?
I want my code to work in any locale. How am I supposed to manipluate
multi-byte strings? For example, shouldn't there at least be a function
like this:
/*
* Returns a pointer to the character at off withing the multi-byte string
* src not examining more than sn bytes.
*/
char *
mbsnoff(char *src, int off, size_t sn)
{
unsigned long ucs;
int w;
size_t n;
mbstate_t ps;
ucs = 1;
memset(&ps, 0, sizeof(ps));
if (sn > INT_MAX) {
sn = INT_MAX;
}
if (off < 0) {
off = INT_MAX;
}
while (ucs && (n = mbrtowc(&ucs, src, sn, &ps)) != (size_t)-2) {
if (n == (size_t)-1) {
return NULL;
}
if ((w = wcwidth(ucs)) > 0) {
if (w > off) {
break;
}
off -= w;
}
sn -= n;
src += n;
}
return src;
}
DISCLAIMER: I just wrote this last night, it hasn't been tested AT ALL.
BTW: The RH 7.2 strchr manpage still reads 'character'.
Mike
--
May The Source be with you.
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/