Re: Converting special characters into entities.

Earl Hood Thu, 12 Aug 1999 11:53:11 -0700
On August 12, 1999 at 18:18, "Peter Seitz jun." wrote:

> I am archiving a german language discussion group and there are lots 
> of umlauts in these mails.
> 
> I'd like to convert the umlauts into entities so these umlauts can be 
> read on various platforms (windows, Macintosh) correctly. I was not 
> able to find out what I have to put into my resource files.
> 
> Can someone please help?

Sure.  The answer will differ depending on if you are dealing
with message header data or message body data.

Header:
     CHARSETCONVERTERS are invoked when non-ASCII extension encoding
     is encountered in message headers.  That is the =?...?.?...?=
     stuff.  Now if the umlauts are in encoded as such, you can
     get the effect you want.

     By default MHonArc will convert 8-bit characters into entity
     references, with the exception of iso-8859-1 character data.
     The reasons is that most browsers default to iso-8859-1.
     To change this, have something like the following in your
     resource file:

     <CharsetConverters>
     iso-8859-1;     iso_8859::str2sgml;     iso8859.pl
     </CharsetConverters>

     If you a non-encoded/raw 8-bit character in the message
     header, MHonArc keeps it as-is.  To force a conversion to
     an entity reference would require code changes to MHonArc
     itself.

Body:
     You'll have to tweak the text/plain filter to call
     iso_8859::str2sgml when iso-8859-1 character data is
     specified (it is already invoked for iso-8859-[2-10]), and probably
     call iso_8859::str2sgml by default if you know there are
     messages that do not specify a charset parameter in
     the Content-Type field, but the message contains 8-bit
     characters.

     I should probably modify the text/plain filter to use
     the functions specified in CHARSETCONVERTERS instead
     of having a hard-coded mapping.  The CHARSETCONVERTERS is
     only checked for "-decode-" settings.

     Note, the use iso_8859::str2sgml does incur a performance
     penalty.  See
     <http://www.xray.mpe.mpg.de/mailing-lists/mhonarc/1998-02/msg00083.html>
     (message-id <[EMAIL PROTECTED]>) for
     more information.

--ewh
Re: Converting special characters into entities.

Reply via email to