Hi All!

Im using LWP to get pages and HTML::TreeBuilder to get
needed info. 

Here is basic scheme:

$ua = new LWP::UserAgent;
$r = $ua->get($url);
$html = decode('web_page encoding', $r->content);

at this point i have utf8 content in $html. 

$r = HTML::TreeBuilder->new_from_content($html);
$r->dump;

and in output I see lots of html encoded entities:

Ссылка н

all of them valid HEX codes of unicode symbols.

What i need to do to get everythink working as expected ?
I meant that i want to see chars as is and not as html 
encoded entities ?

Thanks.

PS: Described scheme works well for most of web pages, but
with some of theme i have such problems.

-- 
If you think of MS-DOS as mono, and Windows as stereo,
 then Linux is Dolby Digital and all the music is free...

Attachment: signature.asc
Description: Digital signature

Reply via email to