On 6/25/17 11:08 PM, George wrote: > On Sun, 2017-06-25 at 12:23 -0400, Chet Ramey wrote: >> On 6/24/17 1:41 PM, Eduardo A. Bustamante López wrote: >> >>> dualbus@debian:~$ LANG=zh_CN.GBK printf '\u4e57' | od -tx1 -An 81 5c It >>> looks like it doesn't detect that \x81\x5c is a single character, and >>> instead treats the multibyte character as separate characters. >> >> >> It's apparently not a single character in that locale. >> > > Yes it is! > > https://en.wikipedia.org/wiki/GBK > \x81 \x5C is a two-byte character from level GBK/3.
OK. The terminal emulator I'm using simply doesn't render the glyph. > But unless I've misunderstood something, it seems to be behaving correctly > already. At least, with the exception of within $'..' quotes. It is behaving correctly. $'...' works using bytes. You can get it to expand a byte sequence to a multibyte character using \u or \x, but it works on bytes and always has, just like in C. Since 0x5c introduces an escape sequence, that's how it's treated. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://cnswww.cns.cwru.edu/~chet/