----- Original Message -----
From: "Gisle Aas" <[EMAIL PROTECTED]>
> > Want to print all comments in an HTML doc?
> >
> > my $p = HTML::TokeParser::Easy->new( $doc );
> > while ( my $token = $p->get_token )
> > {
> > next if ! $p->is_comment( $token );
> > print PHB $p->return_text( $token ), "\n";
> > }
>
> How is actually 'return_text' here different from the old 'get_text'
> that was already provided?
In HTML::TokeParser, you have the following attributes for token types:
["S", $tag, $attr, $attrseq, $text]
["E", $tag, $text]
["T", $text, $is_data]
["C", $text]
["D", $text]
["PI", $token0, $text]
The third, in the list above, is the "text" returned by "get_text". In
other words, IIRC, this is text that is visible on the Web page. However,
all of the tags have an attribute that is "$text". This is the exact text
of the returned token. In Easy.pm, what I did was take the above
information, stuff it into a hash with a bit of identifying information and
add an AUTOLOAD sub that generates the appropriate methods on the fly.
Thus, $text is what "return_text()" returns. I preferred something like
get_attr() and get_text(), but that overrode the original get_text() method
:(
To keep things clear, I used the exact text from the above list. For
example, here's on key in the hash:
S => {
_name => 'START_TAG',
tag => 1,
attr => 2,
attrseq => 3,
text => 4
}
> > next if $p->is_comment( $token );
> > print PHB $p->return_text( $token );
Part of the reason why I like this interface is because without it, the
above two lines were originally:
next if $token->[ 0 ] eq 'C';
print PHB $token->[ 1 ];
Since I am a huge fan of trying to make "intuitive" interfaces, I just
didn't care to try and remember what all of the array elements were.
> I think blessing of the tokens might have merit. I also think that
> HTML::TokeParser (and HTML::PullParser) should have some kind of
> support for this.
[snip]
Well, rather than traipsing too far down this road, perhaps just offering up
the module for inspection is better. The distribution is at
http://www.easystreet.com/~ovid/cgi_course/downloads/HTML-TokeParser-Easy-1.
0.tar.gz
I haven't added any tests as I wasn't really sure if I wouldn't be wasting
my time. However, there is complete POD, so understanding what I did and
why should be fairly clear. It also has some sample programs in the POD. I
don't think that Easy.pm is appropriate for all TokeParser programs, but it
really makes things clearer for those in which it is a good fit -- ugh, was
that an awkward sentence, or what? :)
--
Curtis "Ovid" Poe, Senior Programmer, ONSITE! Technology
Someone asked me how to count to 10 in Perl:
push @A, $_ for reverse q.e...q.n.;for(@A){$_=unpack(q|c|,$_);@a=split//;
shift @a;shift @a if $a[$[]eq$[;$_=join q||,@a};print $_,$/for reverse @A