On Wed, Oct 31, 2012 at 12:13:56AM +0000, Nicholas Marriott wrote:
> That is:
>
> - libedit has a wchar_t* buffer (el->el_line.buffer) and el_line calls
> ct_encode_string to convert it to a char*.
>
> - ct_encode_string calls wctomb which it expects to make UTF-8 but in
> fact because setlocale has not been called it outputs ASCII.
>
> - el_line then uses ct_enc_width which assumes UTF-8 and returns 2. So
> the offset is adjusted by 2 even though only 1 byte was filled in.
>
> - ftp obviously isn't happy about having a position after a \0, so it
> goes boom.
>
> The setlocale() change below will only fix the problem if LC_CTYPE or
> LC_ALL is set to UTF-8. ftp still cores if pasting UTF-8 in C locale.
>
> I think the right fix is for libedit to use the return value of wctomb
> to adjust the offset rather than assuming UTF-8 and working out the
> width itself.
>
> Perhaps something like this (very lightly tested):
>
> Index: chartype.c
> ===================================================================
> RCS file: /cvs/src/lib/libedit/chartype.c,v
> retrieving revision 1.4
> diff -u -p -r1.4 chartype.c
> --- chartype.c 17 Nov 2011 20:14:24 -0000 1.4
> +++ chartype.c 31 Oct 2012 00:13:12 -0000
> @@ -44,6 +44,8 @@
> #define CT_BUFSIZ 1024
>
> #ifdef WIDECHAR
> +protected ssize_t ct_encode_char1(char *, size_t, Char);
> +
> protected void
> ct_conv_buff_resize(ct_buffer_t *conv, size_t mincsize, size_t minwsize)
> {
> @@ -178,27 +180,25 @@ ct_decode_argv(int argc, const char *arg
> protected size_t
> ct_enc_width(Char c)
> {
> - /* UTF-8 encoding specific values */
> - if (c < 0x80)
> - return 1;
> - else if (c < 0x0800)
> - return 2;
> - else if (c < 0x10000)
> - return 3;
> - else if (c < 0x110000)
> - return 4;
> - else
> - return 0; /* not a valid codepoint */
> + char s[MB_CUR_MAX];
> +
> + return ct_encode_char1(s, sizeof s, c);
> }
>
> protected ssize_t
> ct_encode_char(char *dst, size_t len, Char c)
> {
> - ssize_t l = 0;
> if (len < ct_enc_width(c))
> return -1;
> - l = ct_wctomb(dst, c);
> + return ct_encode_char1(dst, len, c);
> +}
>
> +protected ssize_t
> +ct_encode_char1(char *dst, size_t len, Char c)
> +{
> + ssize_t l = 0;
> +
> + l = ct_wctomb(dst, c);
> if (l < 0) {
> ct_wctomb_reset;
> l = 0;
>
With this patch (and without the patch for ftp), the behavior of ftp is
a little weird.
> cd pub/Open
^
Here I write https://en.wikipedia.org/wiki/%C2%BA and press tab. I can
see the character. I press backspace and the completion doesn't work. I
press backspace again and tab. ftp completes to "cd pub/OpeBSD".
>
> On Tue, Oct 30, 2012 at 11:56:18PM +0000, Nicholas Marriott wrote:
> > Hi
> >
> > The buffer isn't zero-terminated, it's the result of calling wctomb to
> > convert the internal wchar_t* that libedit has into a char*.
> >
> > libedit works out the offset in el_line with ct_enc_width which rather
> > foolishly makes the assumption that wctomb will convert to UTF-8, but
> > ftp doesn't call setlocale so it just leaves it as ASCII.
> >
> > Try this:
> >
> > Index: main.c
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/ftp/main.c,v
> > retrieving revision 1.85
> > diff -u -p -r1.85 main.c
> > --- main.c 26 Aug 2012 02:16:02 -0000 1.85
> > +++ main.c 30 Oct 2012 23:52:34 -0000
> > @@ -67,6 +67,7 @@
> >
> > #include <ctype.h>
> > #include <err.h>
> > +#include <locale.h>
> > #include <netdb.h>
> > #include <pwd.h>
> > #include <stdio.h>
> > @@ -90,6 +91,8 @@ main(volatile int argc, char *argv[])
> > char *outfile = NULL;
> > const char *errstr;
> > int dumb_terminal = 0;
> > +
> > + setlocale(LC_CTYPE, "");
> >
> > ftpport = "ftp";
> > httpport = "http";
> >
> >
> >
> >
> >
> > On Tue, Oct 30, 2012 at 10:31:16PM +0100, Otto Moerbeek wrote:
> > > On Tue, Oct 30, 2012 at 10:17:12PM +0100, Otto Moerbeek wrote:
> > >
> > > > On Tue, Oct 30, 2012 at 08:59:27PM +0100, Juan Francisco Cantero
> > > > Hurtado wrote:
> > > >
> > > > > On Tue, Oct 30, 2012 at 09:31:58AM +0100, Otto Moerbeek wrote:
> > > > > > On Mon, Oct 29, 2012 at 06:43:13PM +0100, Juan Francisco Cantero
> > > > > > Hurtado wrote:
> > > > > >
> > > > > > > Chris Cappuccio sent me a mail saying he can't see the
> > > > > > > characters, only
> > > > > > > a question mark.
> > > > > > >
> > > > > > > I'm linking each character to their wikipedia page, so you can
> > > > > > > copy-paste the character.
> > > > > > >
> > > > > > > On Thu, Oct 25, 2012 at 05:07:34AM +0200, Juan Francisco Cantero
> > > > > > > Hurtado wrote:
> > > > > > > > This afternoon I was downloading a tarball from a OpenBSD
> > > > > > > > mirror. I
> > > > > > > > press the key "?" and after the tab key. ftp crashed with a
> > > > > > > > segfault.
> > > > > >
> > > > > > Please also include your environment settings. It is likely locale
> > > > > > plays a role here.
> > > > > >
> > > > > > At least env | grep LC
> > > > > >
> > > > >
> > > > > I've tried the bug in amd64 without locales and also with
> > > > > LC_TIME="es_ES.ISO8859-1" LC_CTYPE="en_US.UTF-8".
> > > > >
> > > > > The i386 system was a clean installation in a virtual machine.
> > > >
> > > > I can now reproduce using a terminal that accepts more than just low
> > > > ascii.
> > > >
> > > > What I see is that when complete() is called the cursor position in
> > > > the EditLine struct is not what it is supposed to be, it points a
> > > > couple of bytes beyond the terminating NUL while it is supposed to
> > > > point to the NUL. That causes confusing in the scanner, getting the
> > > > argument list count wrong.
> > >
> > > Ehh, the buffer is not NUL terminated, but observation still holds:
> > > the cursor position is a couple of bytes further than it
> > > should be.
> > >
> > > >
> > > > The root of the problem seems to be inside the editline lib.
> > > >
> > > > Cc:ing nicm@, maybe he has a clue
> > > >
> > > > -Otto
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > > https://en.wikipedia.org/wiki/%C2%BA
> > > > > > > >
> > > > > > > > Steps for reproduce:
> > > > > > > > # ftp ftp.fr.openbsd.org
> > > > > > > > user and password
> > > > > > > > ascii art
> > > > > > > > ftp> cd pub/Open? <- Here press the tab key
> > > > > > > https://en.wikipedia.org/wiki/%C2%BA
> > > > > > > > segmentation fault (core dumped) ftp ftp.fr.openbsd.org
> > > > > > > >
> > > > > > > > It also crashes with the letter "?" and "?".
> > > > > > > https://en.wikipedia.org/wiki/%C3%81
> > > > > > > https://en.wikipedia.org/wiki/%C3%91
> > > > > > > >
> > > > > > > > Tested in:
> > > > > > > > - A snapshot from yesterday. i386. root account. console/ksh
> > > > > > > > without
> > > > > > > > locales.
> > > > > > > > - A snapshot from a few days ago. amd64. user. urxvt/zsh with
> > > > > > > > utf8
> > > > > > > > locales.
> > > > > > > >
> > > > > > > > I also tested the bug in a remote session with OpenBSD 4.7 and
> > > > > > > > ftp works
> > > > > > > > without problems.
> > > > > > > >
> > > > > > > > I've updated the code of usr.bin/ftp to 2012-10-01 and
> > > > > > > > 2012-01-01 and
> > > > > > > > tried both versions. ftp also crashes.
> > > > > > > >
> > > > > > > > Backtrace:
> > > > > > > > Thread 1 (process 3436):
> > > > > > > > #0 memcpy (dst0=0x9d4160, src0=Variable "src0" is not
> > > > > > > > available.
> > > > > > > > ) at /usr/src/lib/libc/string/bcopy.c:115
> > > > > > > > #1 0x000000000040432b in complete (el=Variable "el" is not
> > > > > > > > available.
> > > > > > > > ) at /usr/src/usr.bin/ftp/complete.c:313
> > > > > > > > #2 0x000000000041eb84 in el_wgets (el=0x20da64800,
> > > > > > > > nread=0x7f7ffffe3ebc) at read.c:612
> > > > > > > > #3 0x000000000041ef8d in el_gets (el=0x20da64800,
> > > > > > > > nread=Variable "nread" is not available.
> > > > > > > > ) at eln.c:78
> > > > > > > > #4 0x000000000040e55f in cmdscanner (top=Variable "top" is not
> > > > > > > > available.
> > > > > > > > ) at /usr/src/usr.bin/ftp/main.c:465
> > > > > > > > #5 0x000000000040eb7c in main (argc=1, argv=0x7f7ffffe4398) at
> > > > > > > > /usr/src/usr.bin/ftp/main.c:369
> > > > > > > >
> > > > > > > > Let me know if it's necessary more info or whatever :)
> > > > > > > >
> > > > > > > > Cheers.
> > > > > > > >
> > > > > > >
--
Juan Francisco Cantero Hurtado http://juanfra.info