Re: Doublewidth EM DASH for unhappy English people

Bram Moolenaar Thu, 12 Apr 2001 10:24:03 -0700

Markus -

> Bram Moolenaar wrote on 2001-04-11 11:36 UTC:
> > I'm confused.  I thought that the width of a Unicode character was fixed.
> > Thus when I take a Unicode character, it is either defined to be
> > single-width or double-width.
> 
> I published such a definition in preliminary form on
> 
>   http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
> 
> and both xterm and the glibc 2.2 UTF-8 locales implement the same. Read
> the comments in the source file to see how that function was
> constructed.

I already had a copy of this.  I notice you have updated the table of
composing characters, I'll include it in Vim.

> ISO 10646:2000 remains completely silent on the issue.
> 
> The Unicode Consortium has only published
> 
>   http://www.unicode.org/unicode/reports/tr11/
> 
> which assigns each Unicode character to one of five EastAsian width
> categories F, H, W, Na, A, N. In orther words, Unicode documents only
> for each character the width semantics in legacy standards, but it does
> *not* prescribe the width semantics of a UTF-8 terminal emulator.

Since applications like Vim depend on a definition, we'll have to set a
de-facto standard then.  Hopefully an official standard will be made out of
this later.

> > If this is not true, I won't be able to edit Unicode with Vim reliably.
> > I'm using the current version of wcwidth().  When someone decides to make
> > a font with different widths, the display will be messed up.  I suppose
> > xterm has the same problem.  Running Vim in a xterm has a double problem
> > (Vim can only guess which characters will end up double-width in the
> > xterm).
> 
> For xterm at least, we have made sure that this is under the control of
> xterm, *not* under the control of the font. Xterm decides which glyphs
> are normal or double-width and then picks then glyphs accordingly from
> one of two monospaced fonts (one normal and one double-width). This way,
> the same font (pair) can be used with different wcwidth conventions,
> which allows us even later to define ESC sequences to switch between
> different width conventions should it be necessary. I think, this is
> clearly the right and most flexible approach. It also solves the problem
> that the CharCell XLFD font category that we want to use for
> applications such as xterm does not allow two different widths in a
> single font.

OK, that simplifies matters a lot.

> If xterm bases its decision on the libc implementation, then at least as
> long as xterm and the text-mode application using it run under the same
> locale, they are guaranteed to agree on the width of every character.
> Problematic is if you telnet within xterm to another machine and you
> application runs potentially under another locale there. Then xterm and
> the text-mode application have incompatible wcwidth conventions. Because
> of this problem, I am playing with the idea of trying to become the
> all-mighty wcwidth dictator and tell everyone to use
> 
>   http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

I think that is a good idea.  Although some people might debate over which
characters are double width.  I wouldn't mind if this discussion takes place
for some time and you make the final descision.  Hopefully soon, so that there
are hardly any xterm or Vim versions around that use the wrong choices.

Note that there are many more situations where things can go wrong.  In Vim
I'm using my own table (copied from your wcwidth()) to avoid problems with old
glibc versions, which are currently installed everywhere.  Not to mention
systems that don't have any UTF-8 support.  And there doesn't seem to be a
version of wcwidth() that accepts an argument for the encoding, instead of
using the current locale (which can't be changed without many side effects).

> Since some CJK users have complained about the above definition (which
> makes all Class A (ambiguous width in legacy implementations) narrow, I
> have added recently to the above file defines also a second wcwidth_cjk
> convention, in which all Class A characters are double-width, thus
> providing an EUC backwards-compatible convention.
> 
> We agree that a single wcwidth convention can't make everyone happy.
> Perhaps two are sufficient?

How do I know which convention xterm is using?  Or can I tell xterm which one
to use?  Actually, I would prefer making a choice and stick to it.  This will
"disappoint" some group of people, but will avoid lots of trouble with yet
another setting that can have the wrong value.

> > Should the width of a character be obtained from the font information?
> 
> Only in situations where you also want to support proportional fonts.
> The classical tty model does not provide a communication mechanism for
> that sort of information. The goal of the exercise here it to keep the
> classical tty model alive. I don't think we want to add ESC sequences to
> query the width semantics of the terminal.

OK.  Vim doesn't use proportional fonts either, thus most things that apply to
xterm apply to Vim as well (also when it's not running in an xterm).

> > Either that or the results of wcwidth() should be set in stone.
> 
> That's what I've tried to do in the above wcwidth and wcwidth_cjk,
> though at the moment it is not yet a formally recognised standard. More
[...]

I support setting a standard for this.  It makes using Unicode a lot easier.

-- 
hundred-and-one symptoms of being an internet addict:
117. You are more comfortable typing in html.

 ///  Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.moolenaar.net  \\\
(((   Creator of Vim - http://www.vim.org -- ftp://ftp.vim.org/pub/vim   )))
 \\\  Help me helping AIDS orphans in Uganda - http://iccf-holland.org  ///
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: Doublewidth EM DASH for unhappy English people

Reply via email to