Re: ksh(1): UTF-8 column count bug

2017-05-11 Thread Theo Buehler
On Thu, May 11, 2017 at 03:36:17PM +0200, Ingo Schwarze wrote:
> Hi,
> 
> i checked Anton's analysis and believe it is completely correct.
> Any OKs to commit?

ok



Re: ksh(1): UTF-8 column count bug

2017-05-11 Thread Ingo Schwarze
Hi,

i checked Anton's analysis and believe it is completely correct.
Any OKs to commit?

See below for details.

Yours,
  Ingo


Anton Lindqvist wrote on Thu, May 11, 2017 at 08:02:59AM +0200:

> I recently encountered a bug in ksh(1)'s emacs mode while running from
> xterm(1). How-to reproduce:
> 
> 1. Insert a sequence containing at least one UTF-8 character until the
>line exceeds the last column, triggering horizontal scroll.
> 
> 2. Moving backwards using any motion (^A, ^B, ^[b) fails when reaching
>the first visible character triggering backwards horizontal scroll.
> 
> It seems like the cause is due to how the column counter works.

You are right that the global variable "static int x_col" is the
current display column.

> Every byte in a UTF-8 character is assumed to occupy one column.
> The most reliable way to solve this is probably using wcwidth(3)
> but that would require extensive work.

Exactly, and not just here, but (at least) at all the places where
isu8cont() is currently used.

> However, the same logic found in the x_size function could instead be
> used in which a UTF-8 continuation byte is assumed to not occupy a
> column.

Exactly.  Right now, we only support single-width characters on
the command line.

> In the process of fixing this, I found another bug. This does only occur
> when the current line exceeds the number of columns and includes UTF-8
> characters. How-to reproduce:
> 
> 1. Use a 80 column wide terminal and ensure your prompt contains at
>least 3 characters:
> 
>PS1='xxx'
> 
> 2. Insert the following sequence, without the leading spaces:
> 
>ö a a a a a a a a a a a a a a a a a a a a a a a b b b b b b b b b b b b b 
> b b b z
> 
> 3. Press ^A (beginning-of-line), ^[f (forward-word), ^W (kill-region).
>At this point the prompt gets overwritten.
> 
> By invalidating xlp in the x_delete function prior calling x_zots in
> which x_lastcp is called causing xlp to be recalculated solves this.

Indeed.

The global variable "static char *xlp /* last byte visible on screen */"
is a pointer into the input buffer.  As long as you only delete a string
of ASCII characters before that point, the last position in the input 
buffer that fits on the screen does not change, even though the bytes
in that part of the buffer obviously change (they are shifted left).
The same bug can even be triggered by a simple ^D (x_del_char()),
you don't even need ^W.  What matters is that the line exceeds the
right marging of the screen.

For the fix, some might think it better to count the width of the
string being deleted and adjust xlp accordingly, which might arguably
require less computation.  But it would require handling a few
corner cases, so the code would be more complicated.  In particular,
there are complications if the deletion causes non-ASCII characters
that previously were outside the screen to move into the screen.
So, simply invalidating xlp looks like the right move; that pattern
is also used in other parts of the code.

Thanks for finding the two issues, understanding them, and providing
the patch!

> Index: emacs.c
> ===
> RCS file: /cvs/src/bin/ksh/emacs.c,v
> retrieving revision 1.66
> diff -u -p -r1.66 emacs.c
> --- emacs.c   9 Aug 2016 11:04:46 -   1.66
> +++ emacs.c   9 May 2017 17:52:37 -
> @@ -533,6 +533,7 @@ x_delete(int nc, int push)
>   }
>   memmove(xcp, xcp+nc, xep - xcp + 1);/* Copies the null */
>   x_adj_ok = 0;   /* don't redraw */
> + xlp_valid = false;
>   x_zots(xcp);
>   /*
>* if we are already filling the line,
> @@ -1909,7 +1910,8 @@ x_e_putc(int c)
>   x_col--;
>   break;
>   default:
> - x_col++;
> + if (!isu8cont(c))
> + x_col++;
>   break;
>   }
>   }



ksh(1): UTF-8 column count bug

2017-05-11 Thread Anton Lindqvist
Hi,
I recently encountered a bug in ksh(1)'s emacs mode while running from
xterm(1). How-to reproduce:

1. Insert a sequence containing at least one UTF-8 character until the
   line exceeds the last column, triggering horizontal scroll.

2. Moving backwards using any motion (^A, ^B, ^[b) fails when reaching
   the first visible character triggering backwards horizontal scroll.

It seems like the cause is due to how the column counter works. Every
byte in a UTF-8 character is assumed to occupy one column. The most
reliable way to solve this is probably using wcwidth(3) but that would
require extensive work. However, the same logic found in the x_size
function could instead be used in which a UTF-8 continuation byte is
assumed to not occupy a column.

In the process of fixing this, I found another bug. This does only occur
when the current line exceeds the number of columns and includes UTF-8
characters. How-to reproduce:

1. Use a 80 column wide terminal and ensure your prompt contains at
   least 3 characters:

   PS1='xxx'

2. Insert the following sequence, without the leading spaces:

   ö a a a a a a a a a a a a a a a a a a a a a a a b b b b b b b b b b b b b b 
b b z

3. Press ^A (beginning-of-line), ^[f (forward-word), ^W (kill-region).
   At this point the prompt gets overwritten.

By invalidating xlp in the x_delete function prior calling x_zots in
which x_lastcp is called causing xlp to be recalculated solves this.

Comments and feedback are much appreciated.

Index: emacs.c
===
RCS file: /cvs/src/bin/ksh/emacs.c,v
retrieving revision 1.66
diff -u -p -r1.66 emacs.c
--- emacs.c 9 Aug 2016 11:04:46 -   1.66
+++ emacs.c 9 May 2017 17:52:37 -
@@ -533,6 +533,7 @@ x_delete(int nc, int push)
}
memmove(xcp, xcp+nc, xep - xcp + 1);/* Copies the null */
x_adj_ok = 0;   /* don't redraw */
+   xlp_valid = false;
x_zots(xcp);
/*
 * if we are already filling the line,
@@ -1909,7 +1910,8 @@ x_e_putc(int c)
x_col--;
break;
default:
-   x_col++;
+   if (!isu8cont(c))
+   x_col++;
break;
}
}