On Mon, 01 Apr 2002 15:52:06 +0100 Markus Kuhn <[EMAIL PROTECTED]> wrote:
> Tomohiro KUBOTA wrote on 2002-04-01 13:34 UTC: > > Michael B. Allen <[EMAIL PROTECTED]> wrote: > > > Does wcwidth require __STDC_ISO_10646__? > > > > No, wcwidth() does not require __STDC_ISO_10646__ . > > In more detail: > <snip> > > http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c > I see. I was looking at this code and assumed it's implementation was standard practice. I knew nothing of it's history. This is good news. It means my code is more portable than I had previously thought. Taking all of this into consideration I think I have a new question. If I want to count characters (rather than screen positions or bytes) I must know how to define a character. For example, I have a function like: /* Return a pointer to a substring of src at character position off not * examining more than sn bytes of src. */ char * mbsnoff(char *src, size_t sn, int off) { wchar_t ucs; size_t n; mbstate_t ps; memset(&ps, 0, sizeof(ps)); if (sn > INT_MAX) { sn = INT_MAX; } if (off < 0) { off = INT_MAX; } while (sn > 0 && off > 0 && (n = mbrtowc(&ucs, src, sn, &ps)) != (size_t)-2) { if (n == (size_t)-1) { PMNO(errno); return NULL; } sn -= n; src += n ? n : 1; if ((n == 0 || wcwidth(ucs) != 0) && --off == 0) { break; } } return src; } I want it to consider combining characters, CJK, or whatever else properly. As it is, this code just skips zero width combining characters and ignores wcwidth > 1 treating CJK as one character. So my question is, will this "count characters" or is there a simpler/better/official way? Thanks, Mike -- May The Source be with you. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
