On Friday, January 31, 2003, at 09:20 PM, Mledie at aol.com wrote:

> Is it the sender or  the mailprogram that
> is to blame? Would outlook express, entourage or aol all have 
> different ways
> of compressing and encoding files?  I guess I should just say : Go 
> figure and
> let it go at that, but that is difficult for me to do. Marta

Any and all of them could be the culprit.

The problem here is that most mail has one byte of information per 
character -- we won't consider languages like Chinese -- and there are 
only 256 different possible bytes. Since e-mail was invented in the US, 
the standards use the ASCII character set where ASCII is the American 
Standard Code for Information Interchange. The ASCII characters use up 
about half the possible bytes, and contain the letters needed for 
English. Other languages code their special characters into the other 
half, and there aren't enough positions to satisfy everyone at the same 
time.

So, a whole bunch of different character sets have been defined for 
different languages to live in the other half. In an e-mail message, 
the header part of the message is supposed to tell for which character 
set the message was composed. Your message contains

MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: AOL 5.0 for Mac sub 39

in its header. A message auf Deutsch might contain, z.B.,

MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
X-Mailer: Mozilla 4.05 [de] (Win98; I)

Diese Angaben beschreiben, welcher Art der Inhalt der Mail ist. Hier 
handelt es sich um reinen Text ("plain text") mit dem Zeichensatz "iso- 
8859-1" und der Sonderzeichenkodierung "quoted-printable".

ISO 8859-1 is a character set that contains most of the "strange" 
characters of the European languages to augment standard ASCII. 
"quoted-printable" tells how the non-ASCII characters are embedded in 
the stream; it's what causes those strange =E9 type things in your 
e-mail.

Now, your mail program has to be able to interpret the character set 
properly (AOL isn't the most able one for this because of the A in its 
name.) You must have the correct font, so the characters appear in the 
right position, and the non-ASCII characters have to get through all 
the computers zwischen Deutschland und die Vereinigen Staaten -- this 
is what MIME and quoted-printable are for.

All this trouble will eventually go away when Unicode becomes the norm. 
It will allow tens of thousands of characters with one encoding.

--
Lee Larson, Mathematics Department, University of Louisville
Phone: 502-852-6826 FAX: 502-852-7132



| The next meeting of the Louisville Computer Society will
| be January 28. The LCS Web page is <http://www.kymac.org>.


Reply via email to