Having a look at ksh, I don't see how Anton's original diff is much different from x_emacs() looping around x_e_getc() until it finishes a long key input?
It would be better to stop reading early if an invalid UTF-8 byte is input rather than always requiring exactly N bytes; he needs to fix his u8len(); and I would want to test the other calls to x_e_getc(), such as in x_search_char_forw() -- but I the general idea seems reasonable. On Sat, May 20, 2017 at 12:04:35AM +0100, Nicholas Marriott wrote: > On Fri, May 19, 2017 at 09:29:06PM +0200, Ingo Schwarze wrote: > > On a side note, i don't think gnome-terminal and konsole are relevant. > > I never installed them before and did so now for the first time for > > testing, but they installed so many libraries that i feel uncomfortable > > and unsafe using them even briefly and purely for testing purposes. > > I will certainly pkg_delete them, and the libraries they pulled in, in > > the near future. > > You might not use them, and I don't use them, but they are very popular, > as are half a dozen others that may well have different behaviours. > > > > FWIW bash seems to do the replacement to U+FFFD itself before it sends > > > it to the terminal, which means it is (more) predictable. I don't know > > > if this is a sensible option for ksh. > > > > That is not really an option. This matters most while the multibyte > > character is being typed, when the first byte is already being > > processed but the second one not yet. Replacing the byte with > > U+FFFD, then later substituting the actual bytes back when all have > > arrived, doesn't really make much sense to me. > > It would help because terminals would know how to deal with the > character. > > So ksh is doing this now - if we pretend A,B,C are UTF-8 characters: > > A,cursor left,B,cursor left,C > > So first A appears, then B overwrites A, then C overwrites B. > > Except A and B are invalid so the terminal may not do the right thing > with them. It may draw no characters and not move the cursor (like > tmux), or draw two and move the cursor by two (like rxvt-unicode appears > to). > > If you sent U+FFFD instead of the invalid characters, that is a known > character and terminals should be guaranteed to do what ksh wants (move > the cursor back one position to the right), until ksh overwrites it > again. > > > Hanging the shell until all expected bytes have arrived seems like > > a bad idea, too. You can see the misbehaviour that is causing in > > programs like uniname(1) from the misc/uniutils package: > > I don't really understand this. How does ksh handle cursor keys? What if > when I press Left, the three bytes (\033OA) are split across two > separate read()s? How does this avoid hanging ksh?