On Tue, May 07, 2002 at 11:13:43AM -0400, John Siracusa wrote: > On 5/7/02 10:58 AM, Paul Lindner wrote: > > The output from your example looks like UTF-8 data (Ã is a > > commonly seen UTF-8 escape sequence). XML::Parser converts all > > incoming text into UTF-8. You will need to convert it back to > > iso-8859-1. > > > > My favorite is Text::Iconv > > > > use Text::Iconv; > > $utf8tolatin1 = Text::Iconv->new("UTF-8", "ISO8859-1"); > > > > my $buffer_latin1 = $converter->convert($buffer); > > So HTML::Entities only works with ISO8859-1 (or ASCII, presumably)? What if > I have actual UTF-8 data? Won't conversion to ISO8859-1 in service of > HTML::Entities result in data loss?
Yes, HTML::Entities is based on ISO8859-1 input only. BTW, for better performance in mod_perl consider using Apache::Util::escape_html() escape_html This routine replaces unsafe characters in $string with their entity representation. my $esc = Apache::Util::escape_html($html); Anyway, back to character entities.. Text::Iconv will fail if you try to convert unconvertable text, so at least you can test for that condition (and adjust accordingly) BasisTech sells a comprehensive unicode library called Rosette that knows how to automatically convert to a target character set while incorporating SGML entities for any character set. Perhaps it's time for an open implementation of that.. Also see http://rf.net/~james/perli18n.html for a perl i18n faq. -- Paul Lindner [EMAIL PROTECTED] ||||| | | | | | | | | | mod_perl Developer's Cookbook http://www.modperlcookbook.org/ Human Rights Declaration http://www.unhchr.ch/udhr/