On Mon, Jun 05, 2017 at 06:06:34PM +0200, Ingo Schwarze wrote: > Hi Walter, > > Walter Alejandro Iglesias wrote on Mon, Jun 05, 2017 at 04:50:21PM +0200: > > > report (I'm on chapter 2 of K&R :-)). I wish with time I'll learn how > > to do it. > > IIRC, you said you saw some undesirable behaviour with ksh input. > > I assume you have a sequence of key presses on your keyboard that > demonstrate the undesirable behaviour. To capture the sequence, >
I will *study* all the indications you gave me. But this time I don't think you need a capture of the sequence. Just use *any* latin-1 character whose hex value is smaller than \xc0. To facilitate you the test, in xterm after setting "setxkbmap de": AltGr + Shift + 1 prints me the opening exclamation mark (\xa1) we also use in Spanish. In console or a C xterm, type that merged among random ascii characters, then move the cursor from the first to the last column passing over that character. Assuming you're running current, see what happens. Anyway, to be honest, these bugs don't hurt, you can live with them. What I'm trying to say with these reports is I'm not truly convinced utf8 support in console is a good idea. Another test you can do, this time in a utf-8 xterm: if you activate the bell and go with the cursor to the end of the line it'll beep. Now type some utf-8 character at the end and do the same, it won't beep, because the cursor is in the first byte of the utf-8 character, *it can't reach the real end of the line*. Nobody will die because this issue or the other above. My point is utf8 will always be a mess. KEN, DO YOU HEAR ME?, IT WAS YOUR OWN CHILD, KEN! :-) I wonder how plan9 handle utf8. [...] > > > For testing, go to the regress directory: > > $ cd /usr/src/regress/bin/ksh > $ cvs up -dP > $ cd edit > $ make obj > $ make cleandir > $ make regress > $ ./obj/edit < input.txt | hexdump -C > 00000000 24 20 78 79 08 c3 a9 79 08 0a |$ xy...y..| > 0000000a I've been wondering how to work with this. Thanks! [...] > > By the way, something the last paragraph of the new utf8(7) man page > > isn't clear enough (I mentioned this to tedu@). > > Which paragraph exactly, and what is unclear? Maybe we can fix it > quickly. As I told you, the _last_ one: Encodings using more bytes than required are invalid. In particular, 11000000 and 11000001 are not valid start bytes, the byte after 11100000 must be at least 10100000, and the byte after 11110000 must be at least 10010000. I don't understand the 'at least' assumptions. Some examples in which the byte after 1110.... is *smaller* than 1010....: Euro sign: 11100010 10000010 10101100 Em dash: 11100010 10000000 10010100 Double quotes: 11100010 10000000 10011100 11100010 10000000 10011101 You can find examples where the byte after 1110.... is *grater* than 1010.... here: http://www.utf8-chartable.de/ Thank you for your advices I'll study your whole message carefuly. > > Yours, > Ingo