Re: A call for fixing aterm/rxvt/etc...

2007-02-25 Thread Rich Felker
On Sat, Feb 24, 2007 at 01:39:25AM -0500, Rich Felker wrote:
  using luit for this sounds appealing, but in my experience luit (a)
  crashes frequently and (b) is easily confused by escape sequences and
  has no user interface for resetting all its iso-2022 state, so in
  practice it works for only a few apps.
 
 Hmm, maybe a replacement for luit is in order then.. If I omit
 iso-2022 support (which IMO is a big plus) then it should just be ~100
 lines of C.. I'll see if I can whip up a prototype sometime soon.

And here it is. Ugly but simple. Syntax is:
tconv [-i inner_encoding] [-o outer_encoding] [-e command ...]

Both encodings default to nl_langinfo(CODESET). Command defaults to
$SHELL. Bad things(tm) may happen if you set either encoding to
something stateful or ascii-incompatible (e.g. non-EUC legacy CJK
encodings) or a transliterating converter.

Actual usage to fix rxvt:
rxvt -e ./tconv -o iso-8859-1

Known bugs: termios handling is somewhat wrong and something should be
done to ensure that replacements made by iconv match the column width
of the correct character, to avoid corrupting the terminal. Maybe
deadlock situations when terminal blocks..? Other bugs?

Rich
/* Written in 2007 by Rich Felker; released to the public domain */

#define _XOPEN_SOURCE 500

#include stdlib.h
#include unistd.h
#include fcntl.h
#include unistd.h
#include sys/ioctl.h
#include stdarg.h
#include signal.h
#include locale.h
#include langinfo.h
#include errno.h
#include sys/time.h
#include sys/select.h
#include termios.h
#include iconv.h

static void dummy(int sig)
{
}

static void print(int fd, ...)
{
va_list ap;
const char *s;
va_start(ap, fd);
while ((s = va_arg(ap, const char *)))
write(fd, s, strlen(s));
}

int main(int argc, char **argv)
{
int i, j;
const char *o_enc, *i_enc;
char **cmd = 0;
int pty;
fd_set rfds, wfds;
char buf[512], buf2[1536];
static struct termios tio, tio_old;
iconv_t itoo, otoi;
char *in, *out;
size_t inb, outb;

#ifdef TIOCSWINSZ
struct winsize ws = { };

signal(SIGWINCH, dummy);
ioctl(0, TIOCGWINSZ, ws);
#endif

tcgetattr(0, tio);
tio_old = tio;
tio.c_cflag = CBAUD;
tio.c_cflag |= CS8 | CLOCAL | CREAD;
tio.c_iflag = 0;
tio.c_oflag = 0;
tio.c_lflag = 0;
tcsetattr(0, TCSANOW, tio);

setlocale(LC_CTYPE, );
i_enc = o_enc = nl_langinfo(CODESET);

for (i=1; iargc  !cmd; i++) {
if (argv[i][0] != '-') {
print(2, argv[0], : unrecognized option: ',
argv[i], '\n, (char*)0);
continue;
}
for (j=1; argv[i][j]; j++) switch (argv[i][1]) {
case 'o':
if (argv[i][j+1]) o_enc = argv[i]+j+1;
else if (i+1  argc) o_enc = argv[++i];
else print(2, argv[0],
: outer encoding omitted\n, (char *)0);
break;
case 'i':
if (argv[i][j+1]) i_enc = argv[i]+j+1;
else if (i+1  argc) i_enc = argv[++i];
else print(2, argv[0],
: inner encoding omitted\n, (char *)0);
break;
case 'e':
if (argv[i][j+1]) argv[i] += j+1;
else if (i+1  argc) i++;
else print(2, argv[0],
: command omitted, using SHELL\n, (char *)0);
/* null terminate our exec arglist */
for (j=0; jargc-i; j++)
argv[j] = argv[j+i];
argv[j] = 0;
cmd = argv;
}
}

itoo = iconv_open(o_enc, i_enc);
otoi = iconv_open(i_enc, o_enc);
if (!itoo || !otoi) {
print(2, argv[0], : failed to open iconv between ,
o_enc,  and , i_enc, \n, (char*)0);
goto die;
}

if ((pty = posix_openpt(O_RDWR|O_NOCTTY))  0
  || grantpt(pty)  0 || unlockpt(pty)  0) {
print(2, argv[0], : failed to get pty: ,
strerror(errno), \n, (char *)0);
goto die;
}

switch(fork()) {
case -1:
print(2, argv[0], : failed to fork child: ,
strerror(errno), \n, (char *)0);
goto die;
case 0:
setsid();
i = open(ptsname(pty), O_RDWR);
close(pty);
dup2(i, 0);
dup2(i, 1);
dup2(i, 2);
if (i  2) close(i);
if (cmd) execvp(cmd[0], 

Re: A call for fixing aterm/rxvt/etc...

2007-02-23 Thread Ben Wiley Sittler

just two cents: i did this some years back for the links and elinks
web browsers (it's the utf-8 i/o option available in some versions
of each) and the results are fairly mixed -- copy-n-paste fails
horribly in an app converted in this way, and i assume the same would
be true of a terminal emulator in a window system like X11. on the
other hand, it meant i and others could use these browsers on e.g. mac
os x years before someoine undertook the much more in-depth utf-8 and
unicode support now in progress for elinks.

using luit for this sounds appealing, but in my experience luit (a)
crashes frequently and (b) is easily confused by escape sequences and
has no user interface for resetting all its iso-2022 state, so in
practice it works for only a few apps.

that said, it would probably be better  thanthe current state of affairs.

On 2/23/07, Rich Felker [EMAIL PROTECTED] wrote:

These days we have at least xterm, urxvt, mlterm, gnome-terminal, and
konsole which support utf-8 fairly well, but on the flip side there's
still a huge number of terminal emulators which do not respect the
user's encoding at all and always behave in a legacy-8bit-codepage
way.

Trying to help users in #irssi, etc. with charset issues, I've come to
believe that it's a fairly significant problem: users get frustrated
with utf-8 because the terminal emulator they want to use (which might
be chosen based on anti-bloat sentiment or, quite the opposite, on a
desire for specialized eye candy only available in one or two
programs) forces their system into a mixed-encoding scenario where
they have both utf-8 and non-utf-8 data in the filesystem and text
files.

How hard would it be to go through the available terminal emulators,
evaluate which ones lack utf-8 support, and provide at least minimal
fixes? In particular, are there any volunteers?

What I'm thinking of as a minimal fix is just putting utf-8 conversion
into the input and output layers. It would still be fine for most
users of these apps if the terminal were limited to a 256-character
subset of UCS, didn't support combining characters or CJK, etc. as
long as the data sent and received over the PTY device is valid UTF-8,
so that the (valid and correct) assumption of applications running on
the terminal that characters are encoded in the locale's encoding is
satisfied.

Perhaps this could be done via a reverse luit -- that is, a program
like luit or an extension to luit that assumes the physical terminal
is using an 8bit legacy codepage rather than UTF-8. Then these
terminals could simply be patched to run luit if the locale's encoding
is not single-byte.

Rich

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: A call for fixing aterm/rxvt/etc...

2007-02-23 Thread Rich Felker
On Fri, Feb 23, 2007 at 04:24:29PM -0800, Ben Wiley Sittler wrote:
 just two cents: i did this some years back for the links and elinks
 web browsers (it's the utf-8 i/o option available in some versions

FWIW: ELinks has since been fixed (in the development versions, not
yet released but working great) to have true UTF-8 support. Proper
Unicode support/m17n is still a ways off tho (bidi, line breaking,
combining characters, cjk-wide behavior, UTF-8 text search, etc.).

 of each) and the results are fairly mixed -- copy-n-paste fails
 horribly in an app converted in this way, and i assume the same would
 be true of a terminal emulator in a window system like X11. on the

Well, copy-n-paste will work fine as long as the characters you want
to copy/paste are in the user's selected legacy codepage. Other
characters naturally are lost, but presumably the user doesn't really
care about characters aside from the ones in their own language or
else they'd get a better terminal..

 using luit for this sounds appealing, but in my experience luit (a)
 crashes frequently and (b) is easily confused by escape sequences and
 has no user interface for resetting all its iso-2022 state, so in
 practice it works for only a few apps.

Hmm, maybe a replacement for luit is in order then.. If I omit
iso-2022 support (which IMO is a big plus) then it should just be ~100
lines of C.. I'll see if I can whip up a prototype sometime soon.

 that said, it would probably be better  thanthe current state of affairs.

Yeah, that was the main thing I wanted to say, I suppose. Of course it
would be nice if someone wants to add proper UTF-8 support, but that's
a lot more work.. IMO, if there were at least minimal UTF-8 support,
it might allow people with modern systems and UTF-8 locales to use
these terminal emulators again, and then they might get interested in
improving them to have real support...

Rich

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/