martin langhoff <[EMAIL PROTECTED]> writes:
> I've found and read the list archive, and Tim's post. The man pages for
> get_tag read:
>
> $p->get_tag( [$tag] )
> This method returns the next start or end tag
> (skipping any other tokens), or undef if there are no
> more tags in the document. If an argument is given,
> then we skip tokens until the specified tag type is
> found. The tag is returned as an array reference in
> the same form as for $p->get_token above, but the type
> code (first element) is missing and the name of end
> tags are prefixed with "/". This means that the tags
> returned look like this:
>
> [$tag, %$attr, @$attrseq, $text]
> ["/$tag", $text]
>
> I read the very first sentence, and the second one, and they mean to me
> that if I call $p->get_tag() it'll get the next tag, opening or closing,
> and will tell me what it is. If I call $p->get_tag('p') it'll find the
> next 'p' tag and tell me whether it's an opening or a closing tag.
I agree that this is confusing. This is the patch I now propose. It
allow you to specify multiple tags to match in the $p->get_tag() call.
I hope the documentation is clearer too.
Regards,
Gisle
Index: lib/HTML/TokeParser.pm
===================================================================
RCS file: /cvsroot/libwww-perl/html-parser/lib/HTML/TokeParser.pm,v
retrieving revision 2.19
retrieving revision 2.20
diff -u -p -u -r2.19 -r2.20
--- lib/HTML/TokeParser.pm 2000/06/09 06:44:41 2.19
+++ lib/HTML/TokeParser.pm 2000/12/04 18:12:49 2.20
@@ -1,10 +1,10 @@
package HTML::TokeParser;
-# $Id: TokeParser.pm,v 2.19 2000/06/09 06:44:41 gisle Exp $
+# $Id: TokeParser.pm,v 2.20 2000/12/04 18:12:49 gisle Exp $
require HTML::Parser;
@ISA=qw(HTML::Parser);
-$VERSION = sprintf("%d.%02d", q$Revision: 2.19 $ =~ /(\d+)\.(\d+)/);
+$VERSION = sprintf("%d.%02d", q$Revision: 2.20 $ =~ /(\d+)\.(\d+)/);
use strict;
use Carp ();
@@ -104,19 +104,17 @@ sub unget_token
sub get_tag
{
my $self = shift;
- my $wanted = shift;
my $token;
- GET_TOKEN:
- {
- $token = $self->get_token;
- if ($token) {
- my $type = shift @$token;
- redo GET_TOKEN if $type !~ /^[SE]$/;
- substr($token->[0], 0, 0) = "/" if $type eq "E";
- redo GET_TOKEN if defined($wanted) && $token->[0] ne $wanted;
+ while (1) {
+ $token = $self->get_token || return undef;
+ my $type = shift @$token;
+ next unless $type eq "S" || $type eq "E";
+ substr($token->[0], 0, 0) = "/" if $type eq "E";
+ return $token unless @_;
+ for (@_) {
+ return $token if $token->[0] eq $_;
}
}
- $token;
}
@@ -226,29 +224,41 @@ is the same as the arguments passed to t
v2 compatible callbacks (see L<HTML::Parser>). In summary, returned
tokens look like this:
- ["S", $tag, %$attr, @$attrseq, $text]
+ ["S", $tag, $attr, $attrseq, $text]
["E", $tag, $text]
["T", $text, $is_data]
["C", $text]
["D", $text]
["PI", $token0, $text]
+where $attr is a hash reference, $attrseq is an array reference and
+the rest is plain scalars.
+
=item $p->unget_token($token,...)
If you find out you have read too many tokens you can push them back,
so that they are returned the next time $p->get_token is called.
-=item $p->get_tag( [$tag] )
+=item $p->get_tag( [$tag, ...] )
This method returns the next start or end tag (skipping any other
-tokens), or C<undef> if there are no more tags in the document. If an
-argument is given, then we skip tokens until the specified tag type is
-found. The tag is returned as an array reference in the same form as
-for $p->get_token above, but the type code (first element) is missing
-and the name of end tags are prefixed with "/". This means that the
-tags returned look like this:
+tokens), or C<undef> if there are no more tags in the document. If
+one or more arguments are given, then we skip tokens until one of the
+specified tag types is found. For example:
+
+ $p->get_tag("font", "/font");
+
+will find the next start or end tag for a font-element.
+
+The tag information is returned as an array reference in the same form
+as for $p->get_token above, but the type code (first element) is
+missing. A start tag will be returned like this:
+
+ [$tag, $attr, $attrseq, $text]
+
+The tagname of end tags are prefixed with "/", i.e. end tag is
+returned like this:
- [$tag, %$attr, @$attrseq, $text]
["/$tag", $text]
=item $p->get_text( [$endtag] )