Eric Gracyalny <[EMAIL PROTECTED]> writes:

> So, the problem:  I have a snippet of HTML in a variable and I call
> HTML::Parser like so:
> 
>       my $p = HTML::Parser->new(
>                                 api_version   => 3,
>                                 start_h       => [
>                                                   \&tag, 
>                                                   "self, tagname, attr, text",
>                                                   ],
>                                 );
>       $p->parse($variable_with_html);

You probably also want to call $p->eof to make sure everything in
$variable_with_html is parsed.

> How do I replace HTML tags within that variable ($variable_with_html)
> with some other bit of text?  Example:
> 
> Before parse variable_with_html contains '<SOMEHTMLTAG OPTION1=""
> OPTION2="">'
> After parse variable_with_html contains 'Something entirely different'
> 
> Now I can do everything I need to do, except alter $variable_with_html
> in the example above.  How do I do that?  Is it possible?

Sure.  I would normally recommend to use code like you find in
'hrefsub' where you just replace the print with accumulation in some
new variable and then in the end just assign the result to
$variable_with_html.

If you insist on inline editing, then something like this should work.
First we parse the string completely to figure out what we want to
fix.  Then we patch it from the end.  This is example code:

-----------------------------------------------------------------------
#!/usr/bin/perl

use HTML::Parser 3 ();
use Data::Dump qw(dump);

$variable_with_html = <<'EOT';
<A HREF="http://www.perl.com">perl</A>
<IMG SRC="foo"> and <IMG SRC="bar" alt="bar">
<IMG SRC="FOO" alt=>
EOT

$p = HTML::Parser->new(api_version => 3,
                       start_h => [ \&kill_images,
                                    "tagname,attr,offset,length",
                                  ],
                      );

my @patch;  # global

$p->parse($variable_with_html)->eof;
dump(@patch);

patch_it($variable_with_html);
print $variable_with_html;

sub patch_it
{
    while (@patch) {
        my($offset, $len, $repl) = splice(@patch, -3, 3);
        substr($_[0], $offset, $len) = $repl;
    }
}

sub kill_images
{
    my $tag = shift;
    return unless $tag eq "img";
    my($attr, $offset, $textlen) = @_;
    my $replacement = $attr->{alt} || "[IMAGE REMOVED]";
    push(@patch, $offset, $textlen, $attr->{alt} || "[IMAGE REMOVED]");
}
-----------------------------------------------------------------------

If you want to be clever you might be tempted to edit the string
during parsing (directly in &kill_images), but then you need to
remember to compensate for offset screw.  You would also need pass in
the string as:

   $p->parse("$variable_with_html")->eof

to make sure the parser gets a copy to work on.  Modifying the parse()
argument before parse() returns is unsafe.

> Every example I've seen (like the hrefsub script) prints the change out
> to standard output.  I need to keep this in a variable, whether the
> original or a copy, it doesn't matter, but I would like to avoid the use
> of a global variable.

We could have avoided the global @patch above by maintaining it as
$p->{patch}.

Hope this gave you some ideas to work on!

Regards,
Gisle

Reply via email to