In a CGI app I'm using HTML::Parser to rewrite web pages before further
processing. It appears that a document containing an ASCII NUL byte (in
this case, as the last byte in the file) has the NUL byte stripped after
processing with HTML::Parser. The offending code is below, and the version
of HTML::Parser is:
% perl -MHTML::Parser -e 'print $HTML::Parser::VERSION'
3.26
code:
sub supress_doctype {
no strict 'vars';
my $file = shift; # $file = \@file
local $HTML = '';
HTML::Parser->new(
default_h => [sub {$HTML .= shift}, 'text'],
declaration_h => [sub {$HTML .= '<!-- ' . $_[0] . ' -->'}, 'text']
)->parse(join "\n", @{$file});
return [split /\n/, $HTML];
}
--
I have lobbied for the update and improvement of SGML. I've done it for years.
I consider it the jewel for which XML is a setting. It does deserve a bit or
polishing now and then. -- Len Bullard