Michael B Allen writes:
> What's the ultimate goal here? Are any of these functions *supposed*
> to work on multi-byte characters, or will there be mbs* functions?
strcpy strcat strdup
already work for multi-byte characters
strncpy strncat strncmp
cannot work for multi-byte characters because they truncate
characters
strcspn strspn strpbrk strstr
you can write multibyte aware analogs of these
strchr strrchr
use a multibyte aware strstr analog instead
Nothing is standardized in this area, but IMO an <mbstring.h> include
file which defines these for arbitrary encodings, and an <unistring.h>
which defines these for UTF-8 strings, would be very nice. I'm working
on an LGPL'ed implementation of the latter.
> /*
> * Returns a pointer to the character at off withing the multi-byte string
^^^^^^
Emphasize: at _screen_position_ off.
> * src not examining more than sn bytes.
> */
> char *
> mbsnoff(char *src, int off, size_t sn)
> {
> unsigned long ucs;
> int w;
> size_t n;
> mbstate_t ps;
>
> ucs = 1;
> memset(&ps, 0, sizeof(ps));
>
> if (sn > INT_MAX) {
> sn = INT_MAX;
> }
> if (off < 0) {
> off = INT_MAX;
> }
>
> while (ucs && (n = mbrtowc(&ucs, src, sn, &ps)) != (size_t)-2) {
Change that to:
while (sn > 0 && (n = mbrtowc(&ucs, src, sn, &ps)) != (size_t)-2) {
> if (n == (size_t)-1) {
> return NULL;
> }
> if ((w = wcwidth(ucs)) > 0) {
> if (w > off) {
> break;
> }
> off -= w;
> }
> sn -= n;
> src += n;
> }
>
> return src;
> }
Bruno
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/