Using the excellent example in the an earlier post from david:
RE: Removing HTML Tags
I came up with this slightly modified version based on the post and some cpan
documentation and it works.
It just brought up a few more questions.
Basically I'm just trying to grab the body contents without comments or script stuff.
So far this module is really cool and handy!!
#!/usr/bin/perl
use HTML::Parser;
my $text = <<HTML;
<html><head>
<title> HI Title </title>
heaD STUFF
</head>
<body bodytag=attributes>
hI HERE'S CONTENT i WANT
<!-- i WANT TO STRIP COMMENTS OUT -->
<SCRIPT>
i DON'T WANT THIS SCRIPT EITHER
</SCRIPT>
</BODY>
</HTMl>
HTML
my $html = HTML::Parser->new(
api_version => 3,
text_h => [sub{ print shift;}, 'dtext'],
start_h => [sub{ print shift;}, 'text'],
end_h => [sub{ print shift;}, 'text']);
#Q) Before I kill the head section or body tags below how do I grab these parts of it?
# 1 - my $title = ???? IE the text between title tags
# 2 - get body tag attributes my $body_attributes = ???? IE in this example it'd
be 'bodytag=attributes'
$html->ignore_elements(qw(head script));
$html->ignore_tags(qw(html body));
$html->parse($text);
$html->eof;
####
It automatically prints the modified version of $text without any print statement.
Q) Why is that?
Q) How can I save the new version of $text to a new variable instead of automatically
printing it to the screen?
( so I can remove empty lines and have my way with it )
Q) I wanted any comments removed too but I didn't do anything special to it and they
are gone anyway, are comments removed automatically then?
OUTPUT ::
(dmuey@q42(~):21)$ ./html.pl
hI HERE'S CONTENT i WANT
(dmuey@q42(~):22)$
Thanks
Dan
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]