[EMAIL PROTECTED] wrote on 2001-04-10 12:11 UTC:
> On the other hand, some symbols (e.g. circle, triangle, star, and so on)
> should be double-width characters in Japanese locale, because Japanese
> use them with double-width glyphs for a long time. The normal-width
> glyphs are not suitable for Japanese document.
> May I ask a question? Did people who used ISO-8859 use these symbols
> in plain text? I think new comers should respect old users.

The people who use ISO 8859 also use or used CP437 (the original IBM
PC character set) is a widely used coded character set with a
significant amount of available text, which is often interspersed with
block graphic elements and other graphical symbols. It contains:

263A  WHITE SMILING FACE
263B  BLACK SMILING FACE
2665  BLACK HEART SUIT
2666  BLACK DIAMOND SUIT
2663  BLACK CLUB SUIT
2660  BLACK SPADE SUIT
2022  BULLET
25D8  INVERSE BULLET
25CB  WHITE CIRCLE
25D9  INVERSE WHITE CIRCLE
2642  MALE SIGN
2640  FEMALE SIGN
266A  EIGHTH NOTE
266B  BEAMED EIGHTH NOTES
263C  WHITE SUN WITH RAYS
25BA  BLACK RIGHT-POINTING POINTER
25C4  BLACK LEFT-POINTING POINTER
2195  UP DOWN ARROW
203C  DOUBLE EXCLAMATION MARK
00B6  PILCROW SIGN
00A7  SECTION SIGN
25AC  BLACK RECTANGLE
21A8  UP DOWN ARROW WITH BASE
2191  UPWARDS ARROW
2193  DOWNWARDS ARROW
2192  RIGHTWARDS ARROW
2190  LEFTWARDS ARROW
221F  RIGHT ANGLE
2194  LEFT RIGHT ARROW
25B2  BLACK UP-POINTING TRIANGLE
25BC  BLACK DOWN-POINTING TRIANGLE
2302  HOUSE
2591  LIGHT SHADE
2592  MEDIUM SHADE
2593  DARK SHADE
2502  BOX DRAWINGS LIGHT VERTICAL
2524  BOX DRAWINGS LIGHT VERTICAL AND LEFT
2561  BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
2562  BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
2556  BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
2555  BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
2563  BOX DRAWINGS DOUBLE VERTICAL AND LEFT
2551  BOX DRAWINGS DOUBLE VERTICAL
2557  BOX DRAWINGS DOUBLE DOWN AND LEFT
255d  BOX DRAWINGS DOUBLE UP AND LEFT
255c  BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
255b  BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
2510  BOX DRAWINGS LIGHT DOWN AND LEFT
2514  BOX DRAWINGS LIGHT UP AND RIGHT
2534  BOX DRAWINGS LIGHT UP AND HORIZONTAL
252c  BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
251c  BOX DRAWINGS LIGHT VERTICAL AND RIGHT
2500  BOX DRAWINGS LIGHT HORIZONTAL
253c  BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
255e  BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
255f  BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
255a  BOX DRAWINGS DOUBLE UP AND RIGHT
2554  BOX DRAWINGS DOUBLE DOWN AND RIGHT
2569  BOX DRAWINGS DOUBLE UP AND HORIZONTAL
2566  BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
2560  BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
2550  BOX DRAWINGS DOUBLE HORIZONTAL
256c  BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
2567  BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
2568  BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
2564  BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
2565  BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
2559  BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
2558  BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
2552  BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
2553  BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
256b  BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
256a  BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
2518  BOX DRAWINGS LIGHT UP AND LEFT
250c  BOX DRAWINGS LIGHT DOWN AND RIGHT
2588  FULL BLOCK
2584  LOWER HALF BLOCK
258c  LEFT HALF BLOCK
2590  RIGHT HALF BLOCK
2580  UPPER HALF BLOCK
25a0  BLACK SQUARE

Just yesterday, I used a very good disassembler for a microcontroller
under a DOS emulator that produced output files containing various of
the above symbols. I was glad to be able to convert this to UTF-8 and
be able to continue using it with my normal Linux toolchain. CP437 is
far from having tied out yet. There is a lot of perfectly useful
MS-DOS software around that is still in wide use and people like
myself wish to be able to send and display output files and text-mode
screenshots of these MS-DOS tools in UTF-8.

The ultimate solution for people who want to use formatted charcell
plaintext written for EUC-JP, etc. is to have wcwidth locale dependent
and to have two Japanese UTF-8 locales with both width conventions.

Sample implementations of both are available in

http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

The wcwidth_cjk version assigns to characters with the EastAsian width
property "ambiguous" in the Unicode database a width of 2, while the
normal wcwidth assigns to these a 1.

Suggestion:

Call the locale with the normal wcwidth behaviour

  ja.UTF-8

and the traditional one (EUC backwards compatibility)

  ja.UTF-8@oldwidth

As long as applications follow the wcwidth provided by the C library,
users can easily change the wcwidth behaviour by simply recompiling
the locale definition files.

In the interest of simplicity and interoperability, we definitely
shout avoid to introduce more than two wcwidth conventions.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to