Wow, I would not have figured that out - at least not in the same day.
I did #1, but I'll bet that doesn't work. Surely they already know about this. I discovered that RT = https://rt.perl.org/ Thanks for doing all that free work :-) Mike On 10/24/2013 10:01 AM, G. Wade Johnson wrote:
That was fun, and it gave me a good excuse to play with Devel::hdb. There's a bug in the way that WWW::SimpleRobot handles broken links. If the link is in the original array that you pass, it recognizes the broken link and calls the callback routine. But, when it's traversing a page and building a list of links, it discards any link that fails a "head" request. So, all broken links would be discarded. That's probably worth a bug report to the author. More Detail ----------- To troubleshoot this, I first ran it the way you did. Then, I looked at the docs for WWW::SimpleRobot and didn't see anything useful there. Next, I looked at the source (nicely formatted by metacpan: https://metacpan.org/source/AWRIGLEY/WWW-SimpleRobot-0.07/SimpleRobot.pm). On line 35, I noticed there was an ability to do a VERBOSE mode. Looking down the code a little ways (lines 119-124), you can see that verbose is used to print a "get $url" line before the BROKEN_LINK_CALLBACK is called. Running that way showed that the code never prints "get http://www.ncgia.ucsb.edu/%7Ecova/seap.html". Looking a little further shows lines 140-142, which discards the link if head() fails. The hdb debugging interface was really nice for this. (Unfortunately, I spent a fair amount of time playing with the debugger.<shrug/>) I can see a couple of ways of fixing this: 1. Easiest: report the bug through RT and hope the author takes care of it soon. 2. Patch your copy of WWW::SimpleRobot code to call the callback at the head() failure or not to discard on the head() request. 3. Copy the WWW::SimpleRobot traversal code into your script and fix it there. The first approach is probably the best. G. Wade
_______________________________________________ Houston mailing list [email protected] http://mail.pm.org/mailman/listinfo/houston Website: http://houston.pm.org/
