ISO 8859-1  National Character Set FAQ  [condensed]

                           Michael K. Gschwind

                       <[EMAIL PROTECTED]>


1. Which coding should I use for accented characters?  
Use the internationally standardized ISO-8859-1 character set to type
accented characters. This character set contains all characters
necessary to type all major (West) European languages.  This encoding
is also the preferred encoding on the Internet.

This character set is also used by AmigaDOS, MS-Windows, VMS (DEC MCS
is practically equivalent to ISO 8859-1) and (practically all) UNIX
implementations.  MS-DOS normally uses a different character set and
is not compatible with this character set. (It can, however, be
translated to this format with various tools.)

Footnote: Supposedly, IBM code page 819 is fully ISO 8859-1 compliant.

ISO 8859-1 supports the following languages:
Afrikaans, Basque, Catalan, Danish, Dutch, English, Faeroese, Finnish,
French, Galician, German, Icelandic, Irish, Italian, Norwegian,
Portuguese, Spanish and Swedish.

5. Translating between different international character sets.
While ISO 8859-1 is an international standard, not everybody uses this
encoding. Many computers use their own, vendor-specific character sets
(most notably Microsoft for MS-DOS).  If you want to edit or view files
written in different encoding, you will have to translate them to an
ISO 8859-1 based representation. 

13.3 News and ISO 8859-1
Much as mail, the Usenet news protocol specification is 7 bit based,
but the infrastructure has been upgraded to 8 bit service...  Thus,
accented characters are transferred correctly between much of Europe
(and Latin America).

ISO 8859-1 is _the_ standard for typing accented characters in most
newsgroups (may be different for MS-DOS centered newsgroups ;-), and
is preferred in most European news group hierarchies, such as at.* or
de.* 

15.4 MS-DOS PCs
MS-DOS PCs normally use a different encoding for accented characters,
so there are two options:

* you can use a terminal emulator which will translate between the
  different encodings.  If you use the PROCOMM PLUS, TELEMATE and
  TELIX modem programs, you can down-load the translation tables 
  from URL ftp://oak.oakland.edu/pub/msdos/commprog/xlate.zip.  (You
  need to install CP850 for this to work.)

* you can reconfigure your MS-DOS PC to use an ISO-8859-1 code page.
  Either install IBM code page 819 (see section 19), or you can get
  the free ISO 8859-X support files from the anonymous ftp archive
  ftp://ftp.uni-erlangen.de/pub/doc/ISO/charsets, which contains data
  on how to do this (and other ISO-related stuff).  The README file
  contains an index of the files you need. 

Note that many terminal emulations for PCs strip the 8th bit when in
text transmission mode.  If you are using such a program to dial up
a computer, you may have to configure your terminal program to
transmit all 8 bits.

18.3 MS DOS
IBM code page 819 _is_ ISO 8859-1.  Code Page 850 has the same
characters as ISO 8859-1, BUT the characters are in different
locations (i.e., you can translate 1-to-1, but you do have to
translate the characters.)

18.4 MS-Windows
Microsoft Windows uses an ISO 8859-1 compatible character set (Code
Page 1252), as delivered in the US, Europe (except Eastern Europe) and
Latin America. In Windows 3.1, Microsoft has added additional characters
in the 0x80-0x9F range.

19. Table of ISO 8859-1 Characters
This section gives an overview of the ISO 8859-1 character set.  The
ISO 8859-1 character set consists of the following four blocks:

00      19      CONTROL CHARACTERS
20      7E      BASIC LATIN
80      9F      EXTENDED CONTROL CHARACTERS
A0      FF      LATIN-1 SUPPLEMENT

The control characters and basic latin blocks are similar do those
used in the US national variant of ISO 646 (US-ASCII), so they are not
listed here.  Nor is the second block of control characters listed,
for which not functions have yet been defined.  

+----+-----+---+------------------------------------------------------
|Hex | Dec |Car| Description ISO/IEC 10646-1:1993(E)
+----+-----+---+------------------------------------------------------
|    |     |   |
| A0 | 160 |   | NO-BREAK SPACE
| A1 | 161 | � | INVERTED EXCLAMATION MARK
| A2 | 162 | � | CENT SIGN
| A3 | 163 | � | POUND SIGN
| A4 | 164 | � | CURRENCY SIGN
| A5 | 165 | � | YEN SIGN
| A6 | 166 | � | BROKEN BAR
| A7 | 167 | � | SECTION SIGN
| A8 | 168 | � | DIAERESIS
| A9 | 169 | � | COPYRIGHT SIGN
| AA | 170 | � | FEMININE ORDINAL INDICATOR
| AB | 171 | � | LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
| AC | 172 | � | NOT SIGN
| AD | 173 | � | SOFT HYPHEN
| AE | 174 | � | REGISTERED SIGN
| AF | 175 | � | MACRON
|    |     |   |
| B0 | 176 | � | DEGREE SIGN
| B1 | 177 | � | PLUS-MINUS SIGN
| B2 | 178 | � | SUPERSCRIPT TWO
| B3 | 179 | � | SUPERSCRIPT THREE
| B4 | 180 | � | ACUTE ACCENT
| B5 | 181 | � | MICRO SIGN
| B6 | 182 | � | PILCROW SIGN
| B7 | 183 | � | MIDDLE DOT
| B8 | 184 | � | CEDILLA
| B9 | 185 | � | SUPERSCRIPT ONE
| BA | 186 | � | MASCULINE ORDINAL INDICATOR
| BB | 187 | � | RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
| BC | 188 | � | VULGAR FRACTION ONE QUARTER
| BD | 189 | � | VULGAR FRACTION ONE HALF
| BE | 190 | � | VULGAR FRACTION THREE QUARTERS
| BF | 191 | � | INVERTED QUESTION MARK
|    |     |   |
| C0 | 192 | � | LATIN CAPITAL LETTER A WITH GRAVE ACCENT
| C1 | 193 | � | LATIN CAPITAL LETTER A WITH ACUTE ACCENT
| C2 | 194 | � | LATIN CAPITAL LETTER A WITH CIRCUMFLEX ACCENT
| C3 | 195 | � | LATIN CAPITAL LETTER A WITH TILDE
| C4 | 196 | � | LATIN CAPITAL LETTER A WITH DIAERESIS
| C5 | 197 | � | LATIN CAPITAL LETTER A WITH RING ABOVE
| C6 | 198 | � | LATIN CAPITAL LIGATURE AE
| C7 | 199 | � | LATIN CAPITAL LETTER C WITH CEDILLA
| C8 | 200 | � | LATIN CAPITAL LETTER E WITH GRAVE ACCENT
| C9 | 201 | � | LATIN CAPITAL LETTER E WITH ACUTE ACCENT
| CA | 202 | � | LATIN CAPITAL LETTER E WITH CIRCUMFLEX ACCENT
| CB | 203 | � | LATIN CAPITAL LETTER E WITH DIAERESIS
| CC | 204 | � | LATIN CAPITAL LETTER I WITH GRAVE ACCENT
| CD | 205 | � | LATIN CAPITAL LETTER I WITH ACUTE ACCENT
| CE | 206 | � | LATIN CAPITAL LETTER I WITH CIRCUMFLEX ACCENT
| CF | 207 | � | LATIN CAPITAL LETTER I WITH DIAERESIS
|    |     |   |
| D0 | 208 | � | LATIN CAPITAL LETTER ETH
| D1 | 209 | � | LATIN CAPITAL LETTER N WITH TILDE
| D2 | 210 | � | LATIN CAPITAL LETTER O WITH GRAVE ACCENT
| D3 | 211 | � | LATIN CAPITAL LETTER O WITH ACUTE ACCENT
| D4 | 212 | � | LATIN CAPITAL LETTER O WITH CIRCUMFLEX ACCENT
| D5 | 213 | � | LATIN CAPITAL LETTER O WITH TILDE
| D6 | 214 | � | LATIN CAPITAL LETTER O WITH DIAERESIS
| D7 | 215 | � | MULTIPLICATION SIGN
| D8 | 216 | � | LATIN CAPITAL LETTER O WITH STROKE
| D9 | 217 | � | LATIN CAPITAL LETTER U WITH GRAVE ACCENT
| DA | 218 | � | LATIN CAPITAL LETTER U WITH ACUTE ACCENT
| DB | 219 | � | LATIN CAPITAL LETTER U WITH CIRCUMFLEX ACCENT
| DC | 220 | � | LATIN CAPITAL LETTER U WITH DIAERESIS
| DD | 221 | � | LATIN CAPITAL LETTER Y WITH ACUTE ACCENT
| DE | 222 | � | LATIN CAPITAL LETTER THORN
| DF | 223 | � | LATIN SMALL LETTER SHARP S
|    |     |   |
| E0 | 224 | � | LATIN SMALL LETTER A WITH GRAVE ACCENT
| E1 | 225 | � | LATIN SMALL LETTER A WITH ACUTE ACCENT
| E2 | 226 | � | LATIN SMALL LETTER A WITH CIRCUMFLEX ACCENT
| E3 | 227 | � | LATIN SMALL LETTER A WITH TILDE
| E4 | 228 | � | LATIN SMALL LETTER A WITH DIAERESIS
| E5 | 229 | � | LATIN SMALL LETTER A WITH RING ABOVE
| E6 | 230 | � | LATIN SMALL LIGATURE AE
| E7 | 231 | � | LATIN SMALL LETTER C WITH CEDILLA
| E8 | 232 | � | LATIN SMALL LETTER E WITH GRAVE ACCENT
| E9 | 233 | � | LATIN SMALL LETTER E WITH ACUTE ACCENT
| EA | 234 | � | LATIN SMALL LETTER E WITH CIRCUMFLEX ACCENT
| EB | 235 | � | LATIN SMALL LETTER E WITH DIAERESIS
| EC | 236 | � | LATIN SMALL LETTER I WITH GRAVE ACCENT
| ED | 237 | � | LATIN SMALL LETTER I WITH ACUTE ACCENT
| EE | 238 | � | LATIN SMALL LETTER I WITH CIRCUMFLEX ACCENT
| EF | 239 | � | LATIN SMALL LETTER I WITH DIAERESIS
|    |     |   |
| F0 | 240 | � | LATIN SMALL LETTER ETH
| F1 | 241 | � | LATIN SMALL LETTER N WITH TILDE
| F2 | 242 | � | LATIN SMALL LETTER O WITH GRAVE ACCENT
| F3 | 243 | � | LATIN SMALL LETTER O WITH ACUTE ACCENT
| F4 | 244 | � | LATIN SMALL LETTER O WITH CIRCUMFLEX ACCENT
| F5 | 245 | � | LATIN SMALL LETTER O WITH TILDE
| F6 | 246 | � | LATIN SMALL LETTER O WITH DIAERESIS
| F7 | 247 | � | DIVISION SIGN
| F8 | 248 | � | LATIN SMALL LETTER O WITH OBLIQUE BAR
| F9 | 249 | � | LATIN SMALL LETTER U WITH GRAVE ACCENT
| FA | 250 | � | LATIN SMALL LETTER U WITH ACUTE ACCENT
| FB | 251 | � | LATIN SMALL LETTER U WITH CIRCUMFLEX ACCENT
| FC | 252 | � | LATIN SMALL LETTER U WITH DIAERESIS
| FD | 253 | � | LATIN SMALL LETTER Y WITH ACUTE ACCENT
| FE | 254 | � | LATIN SMALL LETTER THORN
| FF | 255 | � | LATIN SMALL LETTER Y WITH DIAERESIS
+----+-----+---+------------------------------------------------------

23. Home location of this document
23.1 www
You can find this and other i18n documents under URL
http://www.vlsivie.tuwien.ac.at/mike/i18n.html.

23.2 ftp
The most recent version of this document is available via anonymous
ftp from ftp.vlsivie.tuwien.ac.at under the file name
/pub/8bit/FAQ-ISO-8859-1

-- 
Howard Eisenberger
... DOS TCP/IP * <URL:http://www.ncf.carleton.ca/~ag221/dosppp.html>

Reply via email to