Re: New HTML::TokeParser Interface

Gisle Aas Fri, 01 Feb 2002 23:12:46 -0800

"Curtis Poe" <[EMAIL PROTECTED]> writes:

> I use HTML::TokeParser quite extensively, but I've never been wild about the
> interface.  As a result, I created a subclass that is more
> "self-documenting".  I call it HTML::TokeParser::Easy.  It implements
> get_foo() and return_foo() methods for all tokens.
> 
> For instance, if you want to know if something is a start tag:
> 
>     $parser->is_start_tag( $token );
> 
> Is it a comment?
> 
>     $parser->is_start_tag( $token );
> 
> Want to print all comments in an HTML doc?
> 
>      my $p = HTML::TokeParser::Easy->new( $doc );
>      while ( my $token = $p->get_token )
>      {
>          next if ! $p->is_comment( $token );
>          print PHB $p->return_text( $token ), "\n";
>      }


How is actually 'return_text' here different from the old 'get_text'
that was already provided?

> I also considered blessing the tokens directly:
> 
>      my $p = HTML::TokeParser::Easy->new( $doc );
>      while ( my $token = $p->get_token )
>      {
>          next if ! $token->is_comment;
>          print PHB $token->return_text, "\n";
>      }
> 
> That seems cleaner to read and since token is an array ref, it would be
> transparent, but I decided against that as I dislike the idea of getting in
> the habit of routinely accessing object internals.

I think blessing of the tokens might have merit.  I also think that
HTML::TokeParser (and HTML::PullParser) should have some kind of
support for this.

Could for instance be provided by an interface like:

    my $p = HTML::PullParser->new(file => "index.html",
                                  start => "event, tagname, @attr",
                                  end   => "event, tagname",
                                  token_class => "MyToken",
                                 );

where the array returned by $p->get_token would be blessed if the
'token_class' attribute was set up.

But then you might want tokens from different kind of handlers to end
up with different classes and to make this efficient it might be a
good idea to push this all the way up to HTML::Parser where we could
use the 'argspecs' to specify class names.  If we do it like this then
I think this would be cheap enough for HTML::TokeParser to provide
these methods.

Regards,
Gisle

Re: New HTML::TokeParser Interface

Reply via email to