On Friday, January 31, 2003, at 09:20 PM, Mledie at aol.com wrote:
> Is it the sender or the mailprogram that
> is to blame? Would outlook express, entourage or aol all have
> different ways
> of compressing and encoding files? I guess I should just say : Go
> figure and
> let it go at that, but that is difficult for me to do. Marta
Any and all of them could be the culprit.
The problem here is that most mail has one byte of information per
character -- we won't consider languages like Chinese -- and there are
only 256 different possible bytes. Since e-mail was invented in the US,
the standards use the ASCII character set where ASCII is the American
Standard Code for Information Interchange. The ASCII characters use up
about half the possible bytes, and contain the letters needed for
English. Other languages code their special characters into the other
half, and there aren't enough positions to satisfy everyone at the same
time.
So, a whole bunch of different character sets have been defined for
different languages to live in the other half. In an e-mail message,
the header part of the message is supposed to tell for which character
set the message was composed. Your message contains
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: AOL 5.0 for Mac sub 39
in its header. A message auf Deutsch might contain, z.B.,
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
X-Mailer: Mozilla 4.05 [de] (Win98; I)
Diese Angaben beschreiben, welcher Art der Inhalt der Mail ist. Hier
handelt es sich um reinen Text ("plain text") mit dem Zeichensatz "iso-
8859-1" und der Sonderzeichenkodierung "quoted-printable".
ISO 8859-1 is a character set that contains most of the "strange"
characters of the European languages to augment standard ASCII.
"quoted-printable" tells how the non-ASCII characters are embedded in
the stream; it's what causes those strange =E9 type things in your
e-mail.
Now, your mail program has to be able to interpret the character set
properly (AOL isn't the most able one for this because of the A in its
name.) You must have the correct font, so the characters appear in the
right position, and the non-ASCII characters have to get through all
the computers zwischen Deutschland und die Vereinigen Staaten -- this
is what MIME and quoted-printable are for.
All this trouble will eventually go away when Unicode becomes the norm.
It will allow tens of thousands of characters with one encoding.
--
Lee Larson, Mathematics Department, University of Louisville
Phone: 502-852-6826 FAX: 502-852-7132
| The next meeting of the Louisville Computer Society will
| be January 28. The LCS Web page is <http://www.kymac.org>.