On Mon, 25 Feb 2002 16:16:14 +0100 (CET)
Bruno Haible <[EMAIL PROTECTED]> wrote:

> Michael B Allen writes:
> > Do the str* functions handle strings differently if the locale is
> > different?
> 
> It depends on the functions.
> 
> strcpy strncpy strcat strncat strcmp strncmp strdup strchr strrchr
> strcspn strspn strpbrk strstr strtok: NO
> 
> strcoll strxfrm: YES
> 
> strcasecmp: YES but doesn't work in multibyte locales.
> 
> > For example, does strcmp work on UTF-8 strings?
> 
> Not well. Better use strcoll.

What's the ultimate goal here? Are any of these functions *supposed*
to work on multi-byte characters, or will there be mbs* functions?

I want my code to work in any locale. How am I supposed to manipluate
multi-byte strings? For example, shouldn't there at least be a function
like this:

/*
 * Returns a pointer to the character at off withing the multi-byte string
 * src not examining more than sn bytes.
 */
char *
mbsnoff(char *src, int off, size_t sn)
{
    unsigned long ucs;
    int w;  
    size_t n;
    mbstate_t ps;

    ucs = 1;
    memset(&ps, 0, sizeof(ps));

    if (sn > INT_MAX) {
        sn = INT_MAX;
    }
    if (off < 0) {
        off = INT_MAX;
    }

    while (ucs && (n = mbrtowc(&ucs, src, sn, &ps)) != (size_t)-2) {
        if (n == (size_t)-1) {
            return NULL;
        }
        if ((w = wcwidth(ucs)) > 0) {
            if (w > off) {
                break;
            }
            off -= w;
        }
        sn -= n;
        src += n;
    }

    return src;
}

DISCLAIMER: I just wrote this last night, it hasn't been tested AT ALL.

BTW: The RH 7.2 strchr manpage still reads 'character'.

Mike

-- 
May The Source be with you.
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to