At 01:57 PM 10/2/99 -0700, Earl Hood wrote:
>On October 2, 1999 at 02:17, "SysAdmin, dte.net" wrote:
>
>> Sorry if this has come up before folks, but when running MHonArc
>> I just noticed the following warning:
>> 
>> Warning: Unrecognized character set: windows-1252
>> 
>> The source of the email is Microsoft Outlook Express. Everything
>> is working great regardless of this warning... Is there anything
>> I could do to stop this warning, or does it even really matter?
>
>It probably does not matter.  If MHonArc (or more specifically, the
>text/plain filter) gets a character set it does not recognize, it just
>passes that data through as-is with HTML special characters converted
>into entity references.  This technically goes against MIME conformance
>criteria (see the MIME Conformance section of the documentation), but
>is the best behavior since in most cases, treating the data as the
>local charset works.
>
>As for the non-standard "windows-1252" character set, the only
>potential gotcha is when characters between the range of 128-159
>exist.  This range is not defined by ISO-8859 charsets, and Windows
>historically has used the range for Windows-specific characters.
>Therefore, non-Window clients may not render the characters, or they
>will get rendered in client/OS-specific values.

What is the origin of the prohibition against using these code points?
Isn't it that if you strip the 8th bit they yield control codes?

Anyhow, that is AFAIK the source of the sometimes-voiced allegation that
files using this character set are "not ready for Internet."

For a Perl implementation of a filter to render such texts internet ready,
you might look at

demoronizer

http://language.perl.com/misc/div-www.html

Al

>
>I assume that characters within 160-255 match the iso-8859-1 character
>set, but someone else will have to confirm that.  I have not seen a
>document listing the specifics of windows-1252.  If anyone has any
>pointers, pass them along.
>
>You can shut-up the warnings if you register windows-1252 to
>CHARSETCONVERTERS.  Using an existing converter may work, or if you
>have information on the windows-1252 charset, a specific converter can
>be created.
>
>       --ewh
> 

Reply via email to