[EMAIL PROTECTED] wrote on 2001-04-10 12:11 UTC:
> On the other hand, some symbols (e.g. circle, triangle, star, and so on)
> should be double-width characters in Japanese locale, because Japanese
> use them with double-width glyphs for a long time. The normal-width
> glyphs are not suitable for Japanese document.
> May I ask a question? Did people who used ISO-8859 use these symbols
> in plain text? I think new comers should respect old users.
The people who use ISO 8859 also use or used CP437 (the original IBM
PC character set) is a widely used coded character set with a
significant amount of available text, which is often interspersed with
block graphic elements and other graphical symbols. It contains:
263A WHITE SMILING FACE
263B BLACK SMILING FACE
2665 BLACK HEART SUIT
2666 BLACK DIAMOND SUIT
2663 BLACK CLUB SUIT
2660 BLACK SPADE SUIT
2022 BULLET
25D8 INVERSE BULLET
25CB WHITE CIRCLE
25D9 INVERSE WHITE CIRCLE
2642 MALE SIGN
2640 FEMALE SIGN
266A EIGHTH NOTE
266B BEAMED EIGHTH NOTES
263C WHITE SUN WITH RAYS
25BA BLACK RIGHT-POINTING POINTER
25C4 BLACK LEFT-POINTING POINTER
2195 UP DOWN ARROW
203C DOUBLE EXCLAMATION MARK
00B6 PILCROW SIGN
00A7 SECTION SIGN
25AC BLACK RECTANGLE
21A8 UP DOWN ARROW WITH BASE
2191 UPWARDS ARROW
2193 DOWNWARDS ARROW
2192 RIGHTWARDS ARROW
2190 LEFTWARDS ARROW
221F RIGHT ANGLE
2194 LEFT RIGHT ARROW
25B2 BLACK UP-POINTING TRIANGLE
25BC BLACK DOWN-POINTING TRIANGLE
2302 HOUSE
2591 LIGHT SHADE
2592 MEDIUM SHADE
2593 DARK SHADE
2502 BOX DRAWINGS LIGHT VERTICAL
2524 BOX DRAWINGS LIGHT VERTICAL AND LEFT
2561 BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
2562 BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
2556 BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
2555 BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
2563 BOX DRAWINGS DOUBLE VERTICAL AND LEFT
2551 BOX DRAWINGS DOUBLE VERTICAL
2557 BOX DRAWINGS DOUBLE DOWN AND LEFT
255d BOX DRAWINGS DOUBLE UP AND LEFT
255c BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
255b BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
2510 BOX DRAWINGS LIGHT DOWN AND LEFT
2514 BOX DRAWINGS LIGHT UP AND RIGHT
2534 BOX DRAWINGS LIGHT UP AND HORIZONTAL
252c BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
251c BOX DRAWINGS LIGHT VERTICAL AND RIGHT
2500 BOX DRAWINGS LIGHT HORIZONTAL
253c BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
255e BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
255f BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
255a BOX DRAWINGS DOUBLE UP AND RIGHT
2554 BOX DRAWINGS DOUBLE DOWN AND RIGHT
2569 BOX DRAWINGS DOUBLE UP AND HORIZONTAL
2566 BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
2560 BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
2550 BOX DRAWINGS DOUBLE HORIZONTAL
256c BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
2567 BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
2568 BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
2564 BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
2565 BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
2559 BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
2558 BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
2552 BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
2553 BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
256b BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
256a BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
2518 BOX DRAWINGS LIGHT UP AND LEFT
250c BOX DRAWINGS LIGHT DOWN AND RIGHT
2588 FULL BLOCK
2584 LOWER HALF BLOCK
258c LEFT HALF BLOCK
2590 RIGHT HALF BLOCK
2580 UPPER HALF BLOCK
25a0 BLACK SQUARE
Just yesterday, I used a very good disassembler for a microcontroller
under a DOS emulator that produced output files containing various of
the above symbols. I was glad to be able to convert this to UTF-8 and
be able to continue using it with my normal Linux toolchain. CP437 is
far from having tied out yet. There is a lot of perfectly useful
MS-DOS software around that is still in wide use and people like
myself wish to be able to send and display output files and text-mode
screenshots of these MS-DOS tools in UTF-8.
The ultimate solution for people who want to use formatted charcell
plaintext written for EUC-JP, etc. is to have wcwidth locale dependent
and to have two Japanese UTF-8 locales with both width conventions.
Sample implementations of both are available in
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
The wcwidth_cjk version assigns to characters with the EastAsian width
property "ambiguous" in the Unicode database a width of 2, while the
normal wcwidth assigns to these a 1.
Suggestion:
Call the locale with the normal wcwidth behaviour
ja.UTF-8
and the traditional one (EUC backwards compatibility)
ja.UTF-8@oldwidth
As long as applications follow the wcwidth provided by the C library,
users can easily change the wcwidth behaviour by simply recompiling
the locale definition files.
In the interest of simplicity and interoperability, we definitely
shout avoid to introduce more than two wcwidth conventions.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/