Hi!
In my previuos post I mention URL with description of many standards
(http://bugtraq.ru/library/misc/encoding.html). Because not all of here read
in Russian, I enumerate some numbers, mentioned there (sorry for mistakes,
which I may make when translating).
1. From 1956 to 1962 ASA (later ANSI) X3.A Committee develops ASCII
(American Standard Code for Information Interchange), which was then
released in 1963 and revised in 1968.
2. Lathough ASCII was accepted as US national standard, most corporation
breaks it (for example, IBM continues to use its own patented EBCDIC up
to 1981).
3. In 1967 ISO release ISO 646, which accepts ASCII as international
standard, although ASCII was define for characters coding only 7 bits
(where 32 was control characters, 52 was lower and upper latinic letters,
10 digits, other - punctioation; for "all other languages" was reserved
"10 open positions").
4. Later ASCII was extended to 256 poitions and ISO releases standards ISO
2022 and ISO 8859-x (x=1..15) series. ISO 8859-x was developed for ISO
from the middle of 1980-90 by ECMA (_European_ Computer Manufacturer's
Association), but in each table first 128 postions was should be same, as
in ASCII (and ISO 646). Later (1998)some standards (8859-1, -4, -6) was
revised.
5. Anyway, american corporations not follow the ISO 8859-x series. For
example, Russian language. There was next not compatible tables:
- ISO: 8859-5 "Cyrillic".
- MS-/PC DOS: CP866. ISO 8859 included there as CP915, but it not used,
because not contains pseudographics. Microsoft calls CP as "OEM charsets",
but PC DOS documetation mentions, that CP was standardized in ISO 9241-3.
- Apple: X-Mac-Cyrillic.
- MS: CP1251. CP125x (x=0..8) series used by MS in "national" versions of
Windows (3.x, 9x) and called _by MS_ as "ANSI charsets".
- Soveit (now Russian) national standartizing organization (GOST) defines
KOI-8 (GOST 19768-74; it also defines 128-characters KOI-7), later "main
GOST charset" (19768-87).
- DIN 66234.
6. In first half of 1980-90 IBM and Xerox begin to develop new multilanguage
16-bit coding system (65536 positions). Later this was called
"Unification Code" (Unicode). In 1991 these corporation (including Adobe,
MS, etc) was create "Unicode international consortium".
7. Unicode splitted by 256 sets. First sets contains old tables - for
example, set #0 contains ISO 8859-1. Because 65536 is not enough to
present all possible asian hieroglyphs, there present only subset of them
(Unicode 3.0 contains ~28000 hieroglyphs).
8. In 1990 ISO was studied another coding system - UCS (Universal Coded
Character Set) with 2^32 positions, which splitted by 65536 tables. UCS
was described in draft ISO DIS-10646.1:1990, developed by ISO/IEC
JTC1/SC02/WG02 and supported by european and asian experts, but voted by
american experts.
9. Later appear standard ISO/IEC 10646 Version 2, called ISO/IEC
10646-1:1993 "ISO/IEC 10646 Universal Multiple-Octet Coded Character Set
(UCS) - Part 1: Architecture and Basic Multilingual Plane", which
contains as first set Unicode.
10. Now Unicode exists in versions 1.1 (conforms to ISO/IEC 10646-1:1993),
2.0, 2.1 (ISO/IEC 10646-1:1993 plus Amendments 1-7 and Technical
Corrigenda 1-2), 3.0 (ISO/IEC 10646-1:2000). In 2002 was planned Unicode
3.2, 2003 - Unicode 4.0.
11. Hieroglyphs unification currently performed not by Unicode, but ISO -
IRG committee JTC1/SC02/WG02.
12. To make Unicode compatible with existing software consortium different
methods to represenation of Unicode characters: UTF-8, UTF16, UTF16LE
and UTF16BE. Also, Unicode reserves methods UTF32, UTF32LE and UTF32BE.
13. Facts: Japan experts counted, that characters all known currently and
already died languages may be presented in 2^24 positions. Japan
national standard JIS X 0208-1990 foresee space not only for
hieroglyphs, but also greek and cyrillic letters. In ex-USSR was (and
currently) write in Russian, Ukrainian, Byelorussian, Tatars, Kazah,
Moldovian, Armenian, Georgian, Hebrew, Yakut, baltic languages and much
more (ex-USSR and Russia contains much more nations, than China).
----------
list options/archives/etc.: http://www.topica.com/lists/fd-dev
unsubscribe: send blank email to: [EMAIL PROTECTED]
==^================================================================
This email was sent to: [email protected]
EASY UNSUBSCRIBE click here: http://topica.com/u/?bz8Rv5.bbRv4l.YXJjaGl2
Or send an email to: [EMAIL PROTECTED]
T O P I C A -- Register now to manage your mail!
http://www.topica.com/partner/tag02/register
==^================================================================