On July 19, 2006 at 18:44, Andrew Shirrayev wrote:

> one BIG letter
> $ ls -l
> -rw-r--r--  1 andrews andrews 29197818 Jul 19 18:42 mbox.200410.one
> $ wc mbox.200410.one
>  1199719  2958850 29197818 mbox.200410.one
> 
> 1st way:
> 
> <TextEncode>
> utf-8; MHonArc::UTF8::to_utf8; MHonArc/UTF8.pm
> </TextEncode>

Did you also set:

  <-- With data translated to UTF-8, it simplifies CHARSETCONVERTERS -->
  <CharsetConverters override>
  default; mhonarc::htmlize
  </CharsetConverters>

  <-- Need to also register UTF-8-aware text clipping function -->
  <TextClipFunc>
  MHonArc::UTF8::clip; MHonArc/UTF8.pm
  </TextClipFunc>

If you use TEXTENCODE, you can avoid dealing with MHonArc::CharEnt
with the above CHARSETCONVERTERS.  Without the above, MHonArc will
convert all non-ASCII UTF-8 sequences into entity references.

In general, if you use TEXTENCODE, you should also redefine
CHARSETCONVERTERS appropriately.

--ewh

---------------------------------------------------------------------
To sign-off this list, send email to [EMAIL PROTECTED] with the
message text UNSUBSCRIBE MHONARC-DEV

Reply via email to