I finally got around to finishing HTML::TokeParser::Simple (was:
HTML::TokeParser::Easy) and released it to the CPAN:
http://theoryx5.uwinnipeg.ca/mod_perl/cpan-search?search=HTML%3A%3ATokeParse
r%3A%3ASimple
This is a subclass of HTML::TokeParser that blesses the returned tokens so
you can call methods on them. The original tokens are unchanged, so you
should be able to use this as a drop in replacement. Basically, you have
convenient methods instead of memorizing array references. You can do this:
$token->is_start_tag( 'form' )
Instead of
$token->[0] eq 'S' and $token->[1] eq 'form'
A pathetic, but easy to read, HTML to Text converter:
while ( my $token = $parser->get_token ) {
next if ! $parser->is_text( $token );
print $parser->return_text( $token );
}
Printing all comments:
while ( my $token = $p->get_token ) {
next if ! $token->is_comment;
print PHB $token->return_text, "\n";
}
You get the idea. There are a couple of goofs in the POD (white noise,
basically, no errors that I am aware of), but the tests are fairly solid.
You can use both get_token or get_tag, just be sure to read the POD for a
couple of caveats.
--
Cheers,
Curtis Poe
Senior Programmer
ONSITE! Technology, Inc.
www.onsitetech.com
503-233-1418
Taking e-Business and Internet Technology To The Extreme!