Pádraig Brady wrote: > > This patch is correct (because the characters that you test for in c_iscntrl > > are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a > > multibyte > > character in the EUC-JP, EUC-KR, GB2312, EUC-TW, GB18030, SJIS encodings). > > ... It might be worth mentioning this subtle point in the c_iscntrl() docs? > "Note this identifies all single byte control chars even in multibyte > encodings".
Only in the multibyte encodings that are currently in use. We never know what kinds of features or misfeatures new multibyte encodings will come up with: Before GB18030 was introduced, it was a common feature of all multibyte encodings (including SJIS) that ASCII characters in the range 0x00..0x3F never occur as second or later byte in a multibyte character. Well, GB18030 broke this assumption. So, it is dangerous to rely on this property. Therefore I wouldn't like to document it in the c_iscntrl() documentation. Bruno