Hello -
How do I instruct HTML::TreeBuilder to ignore a particular named character 
reference?

I have some XHTML in utf-8 that includes the named character reference   
for non-breaking spaces.

I am processing this using HTML::Treebuilder, but it is outputting these as 
non-printable characters. I'd prefer to have the original NCR retained.

Here's a simplified version of my code:
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder;

open OUT, ">output.htm" || die "Can't open output file: $!\n";

my $root = HTML::TreeBuilder->new;
$root->parse_file("input.htm");

print $root->as_HTML('','',{}); #print to screen
print OUT $root->as_HTML('','',{}); #print to file
$root->delete();

The output is replacing the   with a non-printable character that is 
rendered in various agents as boxes or question marks enclosed in diamonds.

I've tried adding an explicit decode/encode step and using HTML::Entities, but 
I've had no luck.  Basically, I just want TreeBuilder to ignore the   
references and pass them through. 

Any ideas?
Thanks
Webley


      

Reply via email to