I don't know where else to post this question.

I'm already using LWP::UserAgent and HTML::Parser and successfully fetch and parse documents without problem. However, I would like to be universal. I'm using Perl 5.8.3 with the latest HTML::Parser as of today.

Sometimes when fetching a document you have no idea the encoding and sometimes you do. What I want to know is how do I convert the incoming Web page regardless of encoding to UTF-8 as well as encode entities to something like Aacute (for keyword matching)?

Maybe I'm stupid because I've tried everything I can think of as well as following some examples I've found and no matter what I do, it just doesn't work.

Any help would be appreciated.

Thanks,
John

_________________________________________________________________
Check out Election 2004 for up-to-date election news, plus voter tools and more! http://special.msn.com/msn/election2004.armx




Reply via email to