Almost perfect!!!!
Instead of having an array of tags used in the body. I'd like to keep the tags in the
body.
IE
print "title is: @title\n"; # perfect
print "body text: @body\n"; # this needs to keep the tags were they are**
print "body attr.:\n"; # perfect
while(my($k,$v) = each %body_attr){
print "$k=$v\n";
}
print "Other tag inside body: @tags\n"; # don't really need this
** if
my $text = <<HTML;
<html><head>
<title> HI Title </title>
heaD STUFF
</head>
<body bodytag=attributes>
<i> keep the I tag </i>
hI HERE'S CONTENT i WANT
<img src=""> IMaGE
<!-- i WANT TO STRIP COMMENTS OUT -->
<SCRIPT>
i DON'T WANT THIS SCRIPT EITHER
</SCRIPT>
<font>Hello world</font>
</BODY>
</HTMl>
HTML
Then out put for body should be ::
<i> keep the I tag </i>
hI HERE'S CONTENT i WANT
<img src=""> IMaGE
But it is currently :
keep the I tag
hI HERE'S CONTENT i WANT
IMaGE
Other than that it is perfect! I really appreciate your help on this one.
Dan
I'll check back in tomorrow.
> Dan Muey wrote:
>
> >
> > Very nice, although I'd like to keep html tags that are between the
> > body tags as well except script & comment.
> >
> > Also @body contains the attributes of the body tag as well
> as all of
> > the text in the body :
> >
> > my $new_title = join '', @title;
> > my $new_body_atts = join(//,@body);
> >
> > print "TITLE -$new_title- \n BODY ATTRIBUTES -$new_body_atts- \n";
> >
> > Any ideas?
> >
>
> so you want to:
> 1. get title
> 2. get body but without comment and script
> 3. all other tags except comment and script should be
> included 4. attribute from body should not be part of body
>
> #!/usr/bin/perl -w
> use strict;
>
> use HTML::Parser;
>
> my $text = <<HTML;
> <html><head>
> <title> HI Title </title>
> heaD STUFF
> </head>
> <body bodytag=attributes>
> hI HERE'S CONTENT i WANT
> <!-- i WANT TO STRIP COMMENTS OUT -->
> <SCRIPT>
>
> i DON'T WANT THIS SCRIPT EITHER
> </SCRIPT>
> <font>Hello world</font>
>
> </BODY>
> </HTMl>
> HTML
>
> my $body = 0;
> my $title = 0;
> my @body;
> my @title;
> my @tags;
> my %body_attr;
>
> my $html = HTML::Parser->new(api_version => 3,
> text_h => [\&text,'dtext'],
> start_h =>
> [\&open_tag,'tagname,attr'],
> end_h => [\&close_tag,'tagname']);
>
> $html->ignore_elements(qw(script comment));
> $html->parse($text); $html->eof;
>
> print "title is: @title\n";
> print "body text: @body\n";
> print "body attr.:\n";
> while(my($k,$v) = each %body_attr){
> print "$k=$v\n";
> }
> print "Other tag inside body: @tags\n";
>
> #-- DONE --#
>
> sub text{
> my $text = shift;
>
> return unless($text =~ /\w/);
>
> if($title){
> push(@title,$text);
> }elsif($body){
> push(@body,$text);
> }
> }
>
> sub open_tag{
>
> my $tagname = shift;
> my $attr = shift;
>
> $title = 1 if($tagname eq 'title');
>
> if($tagname eq 'body'){
> $body = 1;
> while(my($key,$value) = each %{$attr}){
> $body_attr{$key} = "'$value'";
> }
> }elsif($body){
> push(@tags,"<$tagname>");
> }
> }
>
> sub close_tag{
>
> my $tagname = shift;
>
> $title = 0 if($tagname eq 'title');
> $body = 0 if($tagname eq 'body');
>
> push(@tags,"</$tagname>") if($body);
> }
>
> __END__
>
> prints:
>
> title is: HI Title
> body text:
> hI HERE'S CONTENT i WANT
> Hello world
> body attr.:
> bodytag='attributes'
> Other tag inside body: <font> </font>
>
> imagine you have to do the same in reg. expr.
>
> david
>
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]