[fd-dev] ASCII

Arkady V.Belousov Mon, 25 Nov 2002 08:39:30 -0800

Hi!

     In my previuos post I mention URL with description of many standards
(http://bugtraq.ru/library/misc/encoding.html). Because not all of here read
in Russian, I enumerate some numbers, mentioned there (sorry for mistakes,
which I may make when translating).


1. From 1956 to 1962 ASA (later ANSI) X3.A Committee develops ASCII
   (American Standard Code for Information Interchange), which was then
   released in 1963 and revised in 1968.

2. Lathough ASCII  was accepted as US national standard, most corporation
   breaks it (for example, IBM continues to use its own patented EBCDIC up
   to 1981).

3. In 1967 ISO release ISO 646, which accepts ASCII as international
   standard, although ASCII was define for characters coding only 7 bits
   (where 32 was control characters, 52 was lower and upper latinic letters,
   10 digits, other - punctioation; for "all other languages" was reserved
   "10 open positions").

4. Later ASCII was extended to 256 poitions and ISO releases standards ISO
   2022 and ISO 8859-x (x=1..15) series. ISO 8859-x was developed for ISO
   from the middle of 1980-90 by ECMA (_European_ Computer Manufacturer's
   Association), but in each table first 128 postions was should be same, as
   in ASCII (and ISO 646). Later (1998)some standards (8859-1, -4, -6) was
   revised.

5. Anyway, american corporations not follow the ISO 8859-x series. For
   example, Russian language. There was next not compatible tables:

- ISO: 8859-5 "Cyrillic".
- MS-/PC DOS: CP866. ISO 8859 included there as CP915, but it not used,
  because not contains pseudographics. Microsoft calls CP as "OEM charsets",
  but PC DOS documetation mentions, that CP was standardized in ISO 9241-3.
- Apple: X-Mac-Cyrillic.
- MS: CP1251. CP125x (x=0..8) series used by MS in "national" versions of
  Windows (3.x, 9x) and called _by MS_ as "ANSI charsets".
- Soveit (now Russian) national standartizing organization (GOST) defines
  KOI-8 (GOST 19768-74; it also defines 128-characters KOI-7), later "main
  GOST charset" (19768-87).
- DIN 66234.

6. In first half of 1980-90 IBM and Xerox begin to develop new multilanguage
   16-bit coding system (65536 positions). Later this was called
   "Unification Code" (Unicode). In 1991 these corporation (including Adobe,
   MS, etc) was create "Unicode international consortium".

7. Unicode splitted by 256 sets. First sets contains old tables - for
   example, set #0 contains ISO 8859-1. Because 65536 is not enough to
   present all possible asian hieroglyphs, there present only subset of them
   (Unicode 3.0 contains ~28000 hieroglyphs).

8. In 1990 ISO was studied another coding system - UCS (Universal Coded
   Character Set) with 2^32 positions, which splitted by 65536 tables. UCS
   was described in draft ISO DIS-10646.1:1990, developed by ISO/IEC
   JTC1/SC02/WG02 and supported by european and asian experts, but voted by
   american experts.

9. Later appear standard ISO/IEC 10646 Version 2, called ISO/IEC
   10646-1:1993 "ISO/IEC 10646 Universal Multiple-Octet Coded Character Set
   (UCS) - Part 1: Architecture and Basic Multilingual Plane", which
   contains as first set Unicode.

10. Now Unicode exists in versions 1.1 (conforms to ISO/IEC 10646-1:1993),
    2.0, 2.1 (ISO/IEC 10646-1:1993 plus Amendments 1-7 and Technical
    Corrigenda 1-2), 3.0 (ISO/IEC 10646-1:2000). In 2002 was planned Unicode
    3.2, 2003 - Unicode 4.0.

11. Hieroglyphs unification currently performed not by Unicode, but ISO -
    IRG committee JTC1/SC02/WG02.

12. To make Unicode compatible with existing software consortium different
    methods to represenation of Unicode characters: UTF-8, UTF16, UTF16LE
    and UTF16BE. Also, Unicode reserves methods UTF32, UTF32LE and UTF32BE.

13. Facts: Japan experts counted, that characters all known currently and
    already died languages may be presented in 2^24 positions. Japan
    national standard JIS X 0208-1990 foresee space not only for
    hieroglyphs, but also greek and cyrillic letters. In ex-USSR was (and
    currently) write in Russian, Ukrainian, Byelorussian, Tatars, Kazah,
    Moldovian, Armenian, Georgian, Hebrew, Yakut, baltic languages and much
    more (ex-USSR and Russia contains much more nations, than China).

----------
list options/archives/etc.: http://www.topica.com/lists/fd-dev
unsubscribe: send blank email to: [EMAIL PROTECTED]

==^================================================================
This email was sent to: [email protected]

EASY UNSUBSCRIBE click here: http://topica.com/u/?bz8Rv5.bbRv4l.YXJjaGl2
Or send an email to: [EMAIL PROTECTED]

T O P I C A -- Register now to manage your mail!
http://www.topica.com/partner/tag02/register
==^================================================================

[fd-dev] ASCII

Reply via email to