In a CGI app I'm using HTML::Parser to rewrite web pages before further
processing. It appears that a document containing an ASCII NUL byte (in
this case, as the last byte in the file) has the NUL byte stripped after
processing with HTML::Parser. The offending code is below, and the version
of HTML::Parser is:

% perl -MHTML::Parser -e 'print $HTML::Parser::VERSION'
3.26


code:


sub supress_doctype {
  no strict 'vars';
  my $file = shift; # $file = \@file
  local $HTML = '';

  HTML::Parser->new(
    default_h => [sub {$HTML .= shift}, 'text'],
    declaration_h => [sub {$HTML .= '<!-- ' . $_[0] . ' -->'}, 'text']
  )->parse(join "\n", @{$file});

  return [split /\n/, $HTML];
}



-- 
I have lobbied for the update and improvement of SGML. I've done it for years.
I consider it the jewel for which XML is a setting.  It does deserve a bit or
polishing now and then.                                        -- Len Bullard

Reply via email to