Here is my proposed writeup, so far, on the oversized sigma (�)
character. I got a response from Markus Kuhn who suggested increased
unification in order to avoid context dependencies in legacy->Unicode
conversion (the legacy charsets being more a collection of glyphs than
anything else), but from looking at the items that are already
approved for Unicode 3.2, I believe we already have to deal with such
context dependencies, and I would rather stick to one design.
The main open issue relates to the oversize bra and ket symbols in HP
Math8; that character set seems to include the capability to
synthesize oversized bra and ket symbols (〈 ... 〉) using the
oversized sigma middle and diagonal glyphs, plus one additional glyph
looking like the sigma middle reversed. At this point I haven't
complicated the proposal by trying to add those characters.
EXTENSIBLE SUMMATION GRAPHIC SYMBOL FOR UNICODE
H. Peter Anvin
Transmeta Corporation
3940 Freedom Circle
San Jose CA 95054
[EMAIL PROTECTED]
2 June 2001
Format: Plain text with line breaks
Encoding: UTF-8
STATUS
Being developed
ABSTRACT
A set of symbols for creating an arbitrary large summation sign on a
monospaced terminal. This proposal, in conjunction with the STIX
[1] and SUPPLEMENTAL TERMINAL GRAPHICS FOR UNICODE [2] proposals,
complete the DEC Technical Character Set (TCS)[3][4], thus allowing
terminal emulators and terminal applications currently using this
character set to migrate to Unicode, which will promote
interoperability of terminal emulators with other Unicode
applications and with each other.
INTRODUCTION
The DEC VT100 series of terminals was one of the first
implementations of the ECMA-48/ISO 6429[5] terminal standards, and
quickly became among the most widely used and, perhaps more
important, emulated terminals ever. An uncountable number of
emulation programs for this series of terminals have been written,
and are still in very wide use today.
It is highly desirable for the full capabilities of these terminals
to be available in Unicode, both for implementing an emulator for
the legacy encoding on a Unicode-based system, and for migrating
applications to using Unicode encodings for these symbols.
The STIX[1] and SUPPLEMENTAL TERMINAL GRAPHICS FOR UNICODE
(STGU)[2], both slated for inclusion in Unicode 3.2, include most of
the necessary symbols not yet included in Unicode 3.1[6], however,
the symbols for the extensible summation character present in the
DEC Technical Character Set[3][4] is not included. A similar
character group is present in the HP Math8[7] character set, available
among others on the widely used HP LaserJet series of printers.
This proposal aims to complete that omission.
PROPOSED NEW CHARACTERS
Proposed character names should be changed as needed to conform to
UTC and WG2 naming rules or conventions. References to STIX or STGU
are based on allocations as of 2 June 2001, and are subject to
change.
These symbols can be combined to form an upper case Greek letter
sigma (��, U+03A3) of any square size, 2x2 or larger, on a monospaced
terminal.
Unifications are discussed later in this document.
#1. LARGE SUMMATION SYMBOL TOP LEFT
This symbol represents the upper left corner of the summation
symbol. It joins to the right with #2 or #3, and joins diagonally
down-right with #4 or #5, or downward with #7.
This corresponds to 03/01 in the DEC TCS.
#2. LARGE SUMMATION SYMBOL UPPER HORIZONTAL EXTENSION
This symbol represents the middle of the upper horizontal stroke
of the summation symbol, and can be extended indefinitely. It
joins to the left with #1 or itself and to the right with #3 or
itself.
This corresponds to 02/03 in the DEC TCS.
#3. LARGE SUMMATION SYMBOL TOP RIGHT
This symbol represents the upper right corner of the summation
symbol, and joins to the left with #2 or #1.
This corresponds to 03/05 in the DEC TCS.
#4. LARGE SUMMATION SYMBOL UPPER DIAGONAL EXTENSION
This symbol represents the upper diagonal part of the summation
symbol, and can be extended indefinitely. It joins diagonally
up-left with #1 or itself, joins diagonally down-right with #5 or
itself, or joins downward with #6.
This corresponds to 03/03 in the DEC TCS.
#5. LARGE SUMMATION SYMBOL MIDDLE
This symbol represents the middle of the summation symbol when the
size is an odd number of characters. It joins diagonally up-left
with #1 or #4 and diagonally down-left with #6 or #7.
This corresponds to 03/07 in the DEC TCS.
#6. LARGE SUMMATION SYMBOL LOWER DIAGONAL EXTENSION
This symbol represents the lower diagonal part of the summation
symbol, and can be extended indefinitely. It joins diagonally
up-right with #5 or itself, joins diagonally down-left with #7 or
itself, or joins upward with #4.
This corresponds to 03/04 in the DEC TCS.
#7. LARGE SUMMATION SYMBOL BOTTOM LEFT
This symbol represents the lower left corner of the summation
symbol. It joins to the right with #8 or #9, and joins diagonally
up-right with #5 or #6, or upward with #1.
This corresponds to 03/02 in the DEC TCS.
#8. LARGE SUMMATION SYMBOL BOTTOM HORIZONTAL EXTENSION
This symbol represents the middle of the lower horizontal stroke
of the summation symbol, and can be extended indefinitely. It
joins to the left with #7 or itself and to the right with #9 or
itself.
This corresponds to 02/03 in the DEC TCS.
#9. LARGE SUMMATION SYMBOL BOTTOM RIGHT
This symbol represents the lower right corner of the summation
symbol, and joins to the left with #7 or #8.
This corresponds to 03/06 in the DEC TCS.
SAMPLE USAGE
A 5x5 summation symbol can be constructed using the following
symbols (blank squares contain a whitespace character, normally
U+0020):
#1 #2 #2 #2 #3
#4
#5
#6
#7 #8 #8 #8 #9
Similarly, a 6x6 summation symbol can be constructed using the
following symbols:
#1 #2 #2 #2 #2 #3
#4
#4
#6
#6
#7 #8 #8 #8 #8 #9
UNIFICATIONS
Using the specific glyphs from the DEC TCS character set, #2 and #6
could be unified with either U+2500 or U+23AF; however, such
unification would be inappropriate for the HP Math8 version of these
symbols, which do not align the horizontal part of the summation
symbol with the middle of the character cell. In order to provide a
representation which is applicable for both these character sets, I
recommend that the full set of nine characters are encoded
independently.
REFERENCES
[1] Unicode Consortium Document L2/00-033R, STIX Math Symbols, 9 Feb 2000.
[2] Supplemental Terminal Graphics for Unicode, Frank da Cruz, 31
March 2000. Available at
<ftp://kermit.columbia.edu/kermit/ucsterminal/ucsterminal.txt>.
[3] Digital Equipment Corporation, Installing and Using the VT420 Video
Terminal EK-VT420-UG.002, Maynard, MA, 1988. Available in
reproduction at <http://vt100.net/docs/vt420-uu/>.
[4] DEC Technical Character Set, VT100.net, Paul Williams;
<http://vt100.net/charsets/technical.html>.
[5] ECMA-48, ECMA, currently in Fifth Edition, June 1991,
<http://www.ecma.ch/ecma1/STAND/ECMA-048.HTM>.
[6] The Unicode Standard, Version 3.0, Addison-Wesley, 2000 as amended
by Unicode Standard Annexes (UAX) 9, 11, 13, 14, 15, 19 and 27,
<http://www.unicode.org/unicode/reports/index.html>.
[7] PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510,
October 1992 PCL Symbol Set id: 8M.
--
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/