On 04/01/2011 08:30, Francis Davey wrote:
Also, I wonder if you could crowdsource bailii scraping somehow 8-)? Just because you can't spider it, doesn't mean individuals can't cut/paste a case into your system with its URL. Bailii do allow an RSS feed, so maybe that, plus a filter could give URL's for individuals to look at and then load up to your system. [ignoring any IP problems there might be in doing so]
There's no such thing as a website which can't be scraped, given sufficient time and resources. If Bailii is rate-limiting by IP address (which is the most common form of anti-scraping protection) then that can easily be circumvented by doing it slowly enough from enough different locations. But, from a practical perspective, I would be wary of the dangers of infringing the intellectual property of an organisation which has counts a number of law firms among its sponsors :-)
Mark -- http://mark.goodge.co.uk _______________________________________________ Mailing list [email protected] Archive, settings, or unsubscribe: https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
