LWP seems to have issues with fetching pages that are utf-8 encoded.

Using a simple script like

    use LWP::UserAgent;
    use Encode;

    my $ua = LWP::UserAgent->new();
    my $resp = $ua->get("http://bild.de";);

    if(Encode::is_utf8($resp->content)) {
        print "utf8\n";
    } else {
        print "no utf8\n";
    }

shows

    "no utf8"

(meaning that although the page is utf-8 encoded, the resulting Perl string 
isn't)
and it prints the warning

    Parsing of undecoded UTF-8 will give garbage when decoding entities
    at .../LWP/Protocol.pm line 114.

which seems to be related to a message I posted last year:

    http://www.nntp.perl.org/group/perl.libwww/2006/08/msg6801.html

although there were no responses at the time.

Verified with perl 5.8.5, HTML::Parser 3.56 and libwww 5.805.

Is there known workarounds or fixes?

-- Mike

Mike Schilli
[EMAIL PROTECTED]

Reply via email to