Having a look at ksh, I don't see how Anton's original diff is much
different from x_emacs() looping around x_e_getc() until it finishes a
long key input?

It would be better to stop reading early if an invalid UTF-8 byte is
input rather than always requiring exactly N bytes; he needs to fix his
u8len(); and I would want to test the other calls to x_e_getc(), such as
in x_search_char_forw() -- but I the general idea seems reasonable.



On Sat, May 20, 2017 at 12:04:35AM +0100, Nicholas Marriott wrote:
> On Fri, May 19, 2017 at 09:29:06PM +0200, Ingo Schwarze wrote:
> > On a side note, i don't think gnome-terminal and konsole are relevant.
> > I never installed them before and did so now for the first time for
> > testing, but they installed so many libraries that i feel uncomfortable
> > and unsafe using them even briefly and purely for testing purposes.
> > I will certainly pkg_delete them, and the libraries they pulled in, in
> > the near future.
> 
> You might not use them, and I don't use them, but they are very popular,
> as are half a dozen others that may well have different behaviours.
> 
> > > FWIW bash seems to do the replacement to U+FFFD itself before it sends
> > > it to the terminal, which means it is (more) predictable. I don't know
> > > if this is a sensible option for ksh.
> > 
> > That is not really an option.  This matters most while the multibyte
> > character is being typed, when the first byte is already being
> > processed but the second one not yet.  Replacing the byte with
> > U+FFFD, then later substituting the actual bytes back when all have
> > arrived, doesn't really make much sense to me.
> 
> It would help because terminals would know how to deal with the
> character.
> 
> So ksh is doing this now - if we pretend A,B,C are UTF-8 characters:
> 
> A,cursor left,B,cursor left,C
> 
> So first A appears, then B overwrites A, then C overwrites B.
> 
> Except A and B are invalid so the terminal may not do the right thing
> with them. It may draw no characters and not move the cursor (like
> tmux), or draw two and move the cursor by two (like rxvt-unicode appears
> to).
> 
> If you sent U+FFFD instead of the invalid characters, that is a known
> character and terminals should be guaranteed to do what ksh wants (move
> the cursor back one position to the right), until ksh overwrites it
> again.
> 
> > Hanging the shell until all expected bytes have arrived seems like
> > a bad idea, too.  You can see the misbehaviour that is causing in
> > programs like uniname(1) from the misc/uniutils package:
> 
> I don't really understand this. How does ksh handle cursor keys? What if
> when I press Left, the three bytes (\033OA) are split across two
> separate read()s? How does this avoid hanging ksh?

Reply via email to