Re: Return HTML (not text) between tags with HTML::Parser?

Daniel Leonard Wed, 23 Feb 2005 11:25:07 -0800

Hi.  I had to parse a rather large number of web pages
and I needed to do exactly the same thing.  What I did
was to set:


 $/ = "<tag>";  

This sets the end of line to what ever is between the
quotes. Thus when you read in a line you will move
from, in your case, tag <A> to <A>.  Just be sure that
you switch $/ back to an end of line when you finish
parsing the file.  It can play havoc with the rest of
your program if you forget. 
-Daniel

--- Maqo <[EMAIL PROTECTED]> wrote:

> Is it possible to use HTML::TokeParser to return the
> raw HTML between
> two start tags (from <A> to <A>, not <A> to </A>),
> as opposed to just 
> the text?  My source file contains several blocks of
> code--containing 
> anchor links for each--that I'm trying to extract by
> section while 
> maintaining formatting.
> 
> Code:
> 
> my $p = HTML::TokeParser->new("file.txt" || die
> "Can't open file.");
> while (my $t = $p->get_tag("a")) {
> my $name = $t->[1]{name};
> next unless $name && ($name eq "anchor");
> print "$name : " . $p->get_text("a");
> 
> Example HTML source:
> 
> <A NAME='anchor1'></A><p>Some text and HTML
> formatting</p><BR>
> <A NAME='anchor2'></A><p>Some text and HTML
> formatting</p><BR>
> ...
> <A NAME='anchor10'></A><p>Some text and HTML
> formatting</p><BR>
> 
> The above code returns the "Some text and HTML
> formatting" portions 
> nicely, albeit only as text.  Is there an easy way
> to do this using
> HTML::Parser to return the desired portion, with
> HTML markup included?
> 
>

Re: Return HTML (not text) between tags with HTML::Parser?

Reply via email to