Hello rigel, On Mon, Oct 16, 2000 at 01:31:18AM -0700, rigel wrote: > Hi Roger, > On Mon, Oct 16, 2000 at 04:10:07PM +1100, Roger So wrote: > > > > > > No, please don't. You should continue to use isprint to test > > > whether a byte is printable. > > > > I thought so too, but isprint(0xA7) didn't work, however > > iswprint(0xA7) worked ...? Now I'm confused ... > > This is indeed correct. Byte 0xA7 is not a legal character in zh > lcoales, so isprint(0xA7) should return 0. While widechar 0xA7 = > U000000A7 = 0xA1EC (gb2312) = 0xA1B1 (big5), is a printable character, > so iswprint(0xA7) returns 1. > > Also glibc retains more information for widechar (used by iswprint) > than for multibyte (used by isprint). Internally the binary locale > file keep two separate sets of information: multibyte and widechar. > All the chars presented in locale def file will be put in widechar > part, while only those also exist in charmap file, i.e. legal chars, > will be recorded in multibyte part. For example, U00A6 exist in zh_HK > def files, although it's not a legal character in Big5HKSCS charmap, > iswprint(0xA6) will return 1. The same call in zh_CN and zh_TW lcoales > will result a 0, because U00A6 is not exist in zh_CN and zh_TW def > files.
Thank you for the clarification -- I stand corrected. So, given a stream of bytes which might contain multibyte characters, how would I test whether a byte is, say, printable? Do I need to test for MB_CUR_MIN to MB_CUR_MAX number of bytes instead of individual bytes? (seems wildly inefficient ...) Also, in glibc, are widechars always in Unicode? (UCS-4?) > > > Just a small thing. A new LC_CTYPE class "hanzi" was added in > > > glibc 2.2 locale (both zh_CN and zh_TW have it, zh_HK doesn't > > > though). > > > > Hmm ... that's a bug ... > > Well, not really a bug. I added this hanzi class in zh_CN. zh_TW's > CTYPE simply copy zh_CN, while zh_HK copy "i18n". Then zh_HK should copy "zh_CN" instead ...? BTW several definitions in zh_HK seems to be wrong; when I get the time I shall have a closer look. Also it seems that an en_HK locale would be nice for people like me :) -- Roger So telnet://e-fever.org spacehunt at e-fever dot org SysOp, e-Fever BBS GnuPG 1024D/98FAA0AD F2C3 4136 8FB1 7502 0C0C 01B1 0E59 37AC 98FA A0AD

