The parser has done what its supposed to. IDK you can alter the encoding in it. 
Maybe you can and that's what you're looking for (encoding or character set). 
I'd first try binmode UTF-8 but you'll probably just end up handling this with 
a regex.

"Lars Noodén" <lars.noo...@gmail.com> wrote:
>If there is a better list for discussing HTML::TokeParser, I can post
>there.  I have a code snippet which successfully extracts a piece of a
>web page.  However, something goes south with the conversion to text.
>What should come out as the following
>
>       Temperature 3.2°C
>       Humidity 94%
>       Dew point 2.3°C
>
>actually comes out as this
>
>       Temperature 3.2��C
>       Humidity 94%
>       Dew�point 2.3��C
>
>and it chokes conky, which is what is calling the script.  How do I get
>TokeParser to translate &nbsp; to space and use the correct degree
>symbol?
>
>The script is below and the target URL is
>
>       http://en.ilmatieteenlaitos.fi/weather/rovaniemi
>
>Regards,
>/Lars
>
>#!/usr/bin/perl
>
>use warnings;
>use strict;
>use HTML::TokeParser;
>use LWP::Simple;
>
>my $url = shift || '-';
>
>my $weather = get( $url );
>
>my $p = HTML::TokeParser->new(\$weather) or
>    die "Can't open: $!";
>
>$p->empty_element_tags(1);  # configure its behaviour
>
>
>while ( my $token = $p->get_tag('table') ) {
>    next unless $token->[1]{class} eq 'observation-text';
>    while ( my $token = $p->get_tag('td') ) {
>        my $tag = $p->get_text('/td');
>        print qq($tag\n);
>    }
>    last;
>}


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to