This is kind of a RTFM question, but I know that Sean always wants
chances to show off his modules and I'm weak on package subclassing
and the such.
Here's my question: I'd like to go through a site (remotely) and search for
some tags (for now, it's sufficient to simply make a list of URLs that
contain the tags). I know that I could use lwp-rget to download the whole
site and then grep every page, but that's horrible overkill, since I just
want to parse each page instead of saving it.
The pseudocode is something like
get:
grab url
parse HTML
foreach internal link, get link (remember depth!)
if specialtag found, push url onto foundlist
print foundlist
Is the right thing to do to copy lwp-rget and splice in some HTML::Filter
code, or is there a better approach?