On Mon, 01 Apr 2002 15:52:06 +0100
Markus Kuhn <[EMAIL PROTECTED]> wrote:

> Tomohiro KUBOTA wrote on 2002-04-01 13:34 UTC:
> > Michael B. Allen <[EMAIL PROTECTED]> wrote:
> > > Does wcwidth require __STDC_ISO_10646__?
> > 
> > No, wcwidth() does not require __STDC_ISO_10646__ .
> 
> In more detail:
> 
<snip>
> 
>   http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
> 

I see. I was looking at this code and assumed it's implementation was
standard practice. I knew nothing of it's history. This is good news. It
means my code is more portable than I had previously thought.

Taking all of this into consideration I think I have a new question. If
I want to count characters (rather than screen positions or bytes) I
must know how to define a character. For example, I have a function like:

/* Return a pointer to a substring of src at character position off not
 * examining more than sn bytes of src.
 */
char *
mbsnoff(char *src, size_t sn, int off)
{
    wchar_t ucs;
    size_t n;
    mbstate_t ps;

    memset(&ps, 0, sizeof(ps));

    if (sn > INT_MAX) {
        sn = INT_MAX;
    }
    if (off < 0) {
        off = INT_MAX;
    }

    while (sn > 0 && off > 0 &&
                (n = mbrtowc(&ucs, src, sn, &ps)) != (size_t)-2) {
        if (n == (size_t)-1) {
            PMNO(errno);
            return NULL;
        }
        sn -= n;
        src += n ? n : 1;
        if ((n == 0 || wcwidth(ucs) != 0) && --off == 0) { 
            break;
        }
    }

    return src;
}

I want it to consider combining characters, CJK, or whatever else
properly. As it is, this code just skips zero width combining characters
and ignores wcwidth > 1 treating CJK as one character.

So my question is, will this "count characters" or is there a
simpler/better/official way?

Thanks,
Mike

-- 
May The Source be with you.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to