"Curtis Poe" <[EMAIL PROTECTED]> writes:
> I use HTML::TokeParser quite extensively, but I've never been wild about the
> interface. As a result, I created a subclass that is more
> "self-documenting". I call it HTML::TokeParser::Easy. It implements
> get_foo() and return_foo() methods for all tokens.
>
> For instance, if you want to know if something is a start tag:
>
> $parser->is_start_tag( $token );
>
> Is it a comment?
>
> $parser->is_start_tag( $token );
>
> Want to print all comments in an HTML doc?
>
> my $p = HTML::TokeParser::Easy->new( $doc );
> while ( my $token = $p->get_token )
> {
> next if ! $p->is_comment( $token );
> print PHB $p->return_text( $token ), "\n";
> }
How is actually 'return_text' here different from the old 'get_text'
that was already provided?
> I also considered blessing the tokens directly:
>
> my $p = HTML::TokeParser::Easy->new( $doc );
> while ( my $token = $p->get_token )
> {
> next if ! $token->is_comment;
> print PHB $token->return_text, "\n";
> }
>
> That seems cleaner to read and since token is an array ref, it would be
> transparent, but I decided against that as I dislike the idea of getting in
> the habit of routinely accessing object internals.
I think blessing of the tokens might have merit. I also think that
HTML::TokeParser (and HTML::PullParser) should have some kind of
support for this.
Could for instance be provided by an interface like:
my $p = HTML::PullParser->new(file => "index.html",
start => "event, tagname, @attr",
end => "event, tagname",
token_class => "MyToken",
);
where the array returned by $p->get_token would be blessed if the
'token_class' attribute was set up.
But then you might want tokens from different kind of handlers to end
up with different classes and to make this efficient it might be a
good idea to push this all the way up to HTML::Parser where we could
use the 'argspecs' to specify class names. If we do it like this then
I think this would be cheap enough for HTML::TokeParser to provide
these methods.
Regards,
Gisle