On Thu, 24 Oct 2013 18:54:53 -0500 Mike Flannigan <[email protected]> wrote:
> > Wow, I would not have figured that out - at least > not in the same day. > > I did #1, but I'll bet that doesn't work. > Surely they already know about this. > > I discovered that RT = > https://rt.perl.org/ > > Thanks for doing all that free work :-) What I should have done was point you at a few things to try and let you make some progress. Since I hadn't tried hdb before and it was way more effective than I thought, you got the advantage of me playing with the tool.<grin/> If anyone hasn't seen the post someone made about this a month ago, you really need to check it out. Decent GUI debugger with the browser as your interface. G. Wade > On 10/24/2013 10:01 AM, G. Wade Johnson wrote: > > That was fun, and it gave me a good excuse to play with Devel::hdb. > > > > There's a bug in the way that WWW::SimpleRobot handles broken links. > > > > If the link is in the original array that you pass, it recognizes > > the broken link and calls the callback routine. > > > > But, when it's traversing a page and building a list of links, it > > discards any link that fails a "head" request. So, all broken links > > would be discarded. > > > > That's probably worth a bug report to the author. > > > > More Detail > > ----------- > > To troubleshoot this, I first ran it the way you did. Then, I looked > > at the docs for WWW::SimpleRobot and didn't see anything useful > > there. > > > > Next, I looked at the source (nicely formatted by metacpan: > > https://metacpan.org/source/AWRIGLEY/WWW-SimpleRobot-0.07/SimpleRobot.pm). > > > > On line 35, I noticed there was an ability to do a VERBOSE mode. > > Looking down the code a little ways (lines 119-124), you can see > > that verbose is used to print a "get $url" line before the > > BROKEN_LINK_CALLBACK is called. > > > > Running that way showed that the code never prints > > "get http://www.ncgia.ucsb.edu/%7Ecova/seap.html". > > > > Looking a little further shows lines 140-142, which discards the > > link if head() fails. > > > > The hdb debugging interface was really nice for this. > > (Unfortunately, I spent a fair amount of time playing with the > > debugger.<shrug/>) > > > > I can see a couple of ways of fixing this: > > > > 1. Easiest: report the bug through RT and hope the author takes > > care of it soon. > > > > 2. Patch your copy of WWW::SimpleRobot code to call the callback at > > the head() failure or not to discard on the head() request. > > > > 3. Copy the WWW::SimpleRobot traversal code into your script and > > fix it there. > > > > The first approach is probably the best. > > > > G. Wade > > > > _______________________________________________ > Houston mailing list > [email protected] > http://mail.pm.org/mailman/listinfo/houston > Website: http://houston.pm.org/ -- We've all heard that a million monkeys banging on a million typewriters will eventually reproduce the works of Shakespeare. Now, thanks to the Internet, we know this is not true. -- Robert Wilensky, UCB _______________________________________________ Houston mailing list [email protected] http://mail.pm.org/mailman/listinfo/houston Website: http://houston.pm.org/
