Wow, I would not have figured that out - at least
not in the same day.

I did #1, but I'll bet that doesn't work.
Surely they already know about this.

I discovered that RT =
https://rt.perl.org/

Thanks for doing all that free work :-)



Mike


On 10/24/2013 10:01 AM, G. Wade Johnson wrote:
That was fun, and it gave me a good excuse to play with Devel::hdb.

There's a bug in the way that WWW::SimpleRobot handles broken links.

If the link is in the original array that you pass, it recognizes the
broken link and calls the callback routine.

But, when it's traversing a page and building a list of links, it
discards any link that fails a "head" request. So, all broken links
would be discarded.

That's probably worth a bug report to the author.

More Detail
-----------
To troubleshoot this, I first ran it the way you did. Then, I looked
at the docs for WWW::SimpleRobot and didn't see anything useful there.

Next, I looked at the source (nicely formatted by metacpan:
https://metacpan.org/source/AWRIGLEY/WWW-SimpleRobot-0.07/SimpleRobot.pm).

On line 35, I noticed there was an ability to do a VERBOSE mode.
Looking down the code a little ways (lines 119-124), you can see that
verbose is used to print a "get $url" line before the
BROKEN_LINK_CALLBACK is called.

Running that way showed that the code never prints
"get http://www.ncgia.ucsb.edu/%7Ecova/seap.html";.

Looking a little further shows lines 140-142, which discards the link
if head() fails.

The hdb debugging interface was really nice for this. (Unfortunately, I
spent a fair amount of time playing with the debugger.<shrug/>)

I can see a couple of ways of fixing this:

1. Easiest: report the bug through RT and hope the author takes care of
it soon.

2. Patch your copy of WWW::SimpleRobot code to call the callback at the
head() failure or not to discard on the head() request.

3. Copy the WWW::SimpleRobot traversal code into your script and fix it
there.

The first approach is probably the best.

G. Wade


_______________________________________________
Houston mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/houston
Website: http://houston.pm.org/

Reply via email to