> On a broader topic, I've been thinking of extending the
> find_by_attributes method to something more general, such that one
> could write things like:
>
> my @matching = $h->magic_scanner(
> '_tag' => 'p',
> 'class' => 'restInfo',
> sub { scalar( $_->find_by_attribute('class', 'restName') },
> qr/mm-mm-good!/,
> # maybe that should mean that the same as:
> # sub { $_->as_text() =~ m/mm-mm-good!/; }
> );
This is an attempt to abstract away from the details of traversal,
but it can be taken one step further: implement a query language!
I.e. what you really want to write is something like
my @matching = $h->find_by_query(
'<P CLASS="restInfo"> containing <SPAN CLASS="restName">
containing /mm-mm-good!/'
);
I actually have a specification for this language on paper, done as a
mental exercise. My intention is to implement parts of it on top of
HTML::Element, but it may never be finished. Other HTML query
languages have been published in the literature that could be used.
> Or maybe that's the /first/ thing that needs doing -- while traverse()
> is very general, maybe what most people /mean/ by using it most of the
> time could be done more intuitively with something more like the
> above.
>
> (Alternately: "Of course, at some point this just turns into
> find(1)...", with -prune and -o and whatnot.)
What I'm thinking of more is XSLT/XPath.
> As to your crypto-code like:
>
> > $p->content =~ /<B>Cuisine:</B> (.*) <BR>/;
> > $rest{cuisine} = $1;
>
> ...this can be expressed in terms of tree structure as: find 'b'
> elements with one text node (consisting of cuisine) as a child, and
> then looking at its right sister node, which should be text...
Yes, but it would be nice to use a HTML-like notation,
while employing the benefits of real HTML parsing.
> --
> Sean M. Burke [EMAIL PROTECTED] http://www.spinn.net/~sburke/
--
Reinier Post [EMAIL PROTECTED]