Pablo Saratxaga wrote on 2001-05-10 15:50 UTC:
> Btw, I think that a number of CJK displaying problems I keep saying
> are linked to the fact that strlen() doesn't work for non 8bit encodings.
>
> There is wcslen() of course; but often the strings are not in wc, but in mb.
> converting mb<->wc adds complexity, and some programmers that don't worry
> too much about i18n won't care about it.
>
> Is there an mbslen()? (that is, a function like wcslen, but applied to
> a mb string; that does any necessary mb<->wc conversion internally).

Sort of:

  #define mbslen(s, ps) mbsrtowcs(NULL, &s, SIZE_MAX, ps)
  #define mbslen(s)     mbsrtowcs(NULL, &s, SIZE_MAX, NULL)

might do the job. (You can't use mbstowcs here unfortunately, because
ISO C 99 doesn't specify that it can be used with pwcs==NULL. :-(( )

Note that these functions return (size_t)(-1) if they run into a
malformed sequence, which I think is a big hassle in practice in
languages without exception handling.

The length of a string matters for two applications:

  a) Find out how much memory to allocate. This requires a byte count,
     and strlen does exactly what you want, even for multi-byte encodings.

  b) Find out, how many columns the cursor will advance if a string is sent
     to a terminal. For wide strings, we have here wcswidth, but for
     multibyte strings, there is no standardized convenient alternative.

I don't think, you want to replace strlen with mbslen very frequently!

The thing that I *REALLY* miss is the multi-byte version of wcwidth and
wcswidth:

  mbwidth       column width of one multi-byte character
  mbswidth      column width of a multi-byte string

It would be up to X/Open to add these, because ISO C has decided that it
doesn't want to be responsible with character terminal width
information.

Can mbwidth/mbswidth still be squeezed into the currently being
finalized POSIX/SUS merger specification?

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to