Michael B. Allen wrote on 2002-04-01 19:56 UTC: > If I want to count characters (rather than screen positions or bytes) I > must know how to define a character.
For what purpose do you want to count characters? Most of the time, people really are interested in counting either screen positions or bytes. If you count characters, then you have to decide whether you want to count only graphical base characters or all graphical characters or all characters. You also have to decide whether you want to turn the string first into a normalization form before you start counting. All these questions are difficult to answer without knowing why you want to have the count. If you want to count an arbitrary subset of Unicode characters based on character range, character category, etc., then you can easily use the "uniset" software that I used to generate the combining characters table in wcwidth.c: http://www.cl.cam.ac.uk/~mgk25/download/uniset.tar.gz You'll find the documentation of the Unicode character categories in http://www.unicode.org/Public/UNIDATA/UnicodeData.html#General%20Category and in more detail in the Unicode 3.0 book. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
