>From the "Changes" entry:
2000-06-12 Sean M. Burke <[EMAIL PROTECTED]>
Release 0.67. Just changes to HTML::Element...
Introduced look_up and look_down. Thanks to the folks on the
libwww list for helping me find the right form for that idea.
Deprecated find_by_attribute
Doc typo fixed: at one point in the discussion of "consolidating
text", I said push_content('Skronk') when I meant
unshift_content('Skronk'). Thanks to Richard Y. Kim ([EMAIL PROTECTED])
for pointing this out.
Added left() and right() methods.
Made address([address]) accept relative addresses (".3.0.1")
Added content_array_ref and content_refs_list.
Added a bit more clarification to bits of the Element docs here and
there.
Made find_by_tag_name work iteratively now, for speed.
-*-
Anyone taking a historical interest in the direction of development of
HTML::Element's interface might be interested in my rationale for
adding look_down: I think that the traverse() method makes hard things
possible, but not easy things easy. Notably, scanning a tree for
nodes matching some criteria should be an extremely easy task (well,
proportional to how easily you can specify what criteria you want
matching to be based on). So making scanning easy is what look_down
is supposed to be for.
Look_up was added just by analogy -- instead of "traversing down" it
"traverses up", to abuse a term or two. The code for look_down is not
terribly opaque, either, compared to the hurtful mess that I turned
traverse() into. Both, incidentally, are iteratively implemented
traversers. The main reason traverse() isn't nice and pretty like
look_down is that it has to be able to do post-order callbacks, which
means you can no longer traverse with /just/ this familiar approach:
my @stack = ($start);
my $this;
while(@stack) {
$this = shift @stack;
...visit $this, lasting out if you want to abort, nexting if
you want to prune...
unshift @stack, grep ref($_), $this->content_list;
}
Anyhow, here's the docs for look_down and look_up, pasted in from the
POD... (At some point I think I should make a cookbook of tree
extraction recipes, elaborating on the examples below. Suggestions
and contributions welcome.)
$h->look_down( ...criteria... )
This starts at $h and looks thru its element descendants (in
pre-order), looking for elements matching the criteria you specify. In
list context, returns all elements that match all the given criteria;
in scalar context, returns the first such element (or undef, if
nothing matched).
There are two kinds of criteria you can specify:
(attr_name, attr_value)
This means you're looking for an element with that value for that
attribute. Example: "alt", "pix!". Consider that you can search on
internal attribute values too: "_tag", "p".
a coderef
This means you're looking for elements where coderef->(each_element)
returns true. Example:
my @wide_pix_images
= $h->look_down(
"_tag", "img",
"alt", "pix!",
sub { $_[0]->attr('width') > 350 }
);
Note that (attr_name, attr_value) criteria are faster than coderef
criteria, so should presumably be put before them in your list of
criteria. That is, in the example above, the sub ref is called only
for elements that have already passed the criteria of having a "_tag"
attribute with value "img", and an "alt" attribute with value
"pix!". If the coderef were first, it would be called on every
element, and then what elements pass that criterion (i.e., elements
for which the coderef returned true) would be checked for their "_tag"
and "alt" attributes.
Note that comparison of string attribute-values against the string
value in (attr_name, attr_value) is case-INsensitive! A criterion of
('align', 'right') will match an element whose "align" value is
"RIGHT", or "right" or "rIGhT", etc.
Note also that look_down considers "" (empty-string) and undef to be
different things, in attribute values. So this:
$h->look_down("alt", "")
will find elements with an "alt" attribute, but where the value for
the "alt" attribute is "". But this:
$h->look_down("alt", undef)
is the same as:
$h->look_down(sub { !defined($_[0]->attr('alt')) } )
That is, it finds elements that do not have an "alt" attribute at all
(or that do have an "alt" attribute, but with a value of undef --
which is not normally possible).
Note that when you give several criteria, this is taken to mean you're
looking for elements that match all your criterion, not just any of
them. In other words, there is an implicit "and", not an "or". So if
you wanted to express that you wanted to find elements with a "name"
attribute with the value "foo" or with an "id" attribute with the
value "baz", you'd have to do it like:
@them = $h->look_down(
sub {
# the lcs are to fold case
lc($_[0]->attr('name')) eq 'foo'
or lc($_[0]->attr('id')) eq 'baz'
}
);
Coderef criteria are more expressive than (attr_name, attr_value)
criteria, and all (attr_name, attr_value) criteria could be expressed
in terms of coderefs. However, (attr_name, attr_value) criteria are a
convenient shorthand. (In fact, look_down itself is basically
"shorthand" too, since anything you can do with look_down you could do
by traversing the tree, either with the traverse method or with a
routine of your own. However, look_down often makes for very concise
and clear code.)
$h->look_up( ...criteria... )
This is identical to $h->look_down, except that whereas $h->look_down
basically scans over the list:
($h, $h->descendants)
$h->look_up instead scans over the list
($h, $h->lineage)
So, for example, this returns all ancestors of $h (possibly including
$h itself) that are "td" elements with an "align" attribute with a
value of "right" (or "RIGHT", etc.):
$h->look_up("_tag", "td", "align", "right");
--
Sean M. Burke [EMAIL PROTECTED] http://www.spinn.net/~sburke/