Roberto García wrote: > The problem appears when I'm scrapping multiple results pages and it > is due to the fact that when I use "piggybank.scrapeURL", URLs are > queued, thus implementing a breadth-first search. The result is that, > due to problems in the site, previous multiple results pages are > masked by the following ones.
Perhaps to scrape it you could run through the breadth first queue simply as an exercise for re-queuing? That is, scrape year 1, page 1 solely to uncover that it has three pages, and requeue all three for later, detailed scraping. Clearly it's going to take longer, and may even be a terrible suggestion if you've got a few decades to work through. Let us know how it goes. -- Ryan Lee [EMAIL PROTECTED] MIT CSAIL Research Staff http://simile.mit.edu/ http://people.csail.mit.edu/ryanlee/ _______________________________________________ General mailing list [email protected] http://simile.mit.edu/mailman/listinfo/general
