Re: [Talk-GB] Request for UK address lists for postcode extraction
David Earl wrote: Sent: 01 December 2008 3:10 PM To: talk-gb@openstreetmap.org Subject: Re: [Talk-GB] Request for UK address lists for postcode extraction On 01/12/2008 14:11, Brian Quinion wrote: Hi, I'm currently doing some work trying to generate postcode location data for the UK using address lists and address lookup using OSM data to supplement NPE. So far it seems to work quite well with the address lists that I have available to me (and coping quite well with ambiguous road names) but I'm limited in my data sources and most of the address data is fairly consistent in both format and quality. So, before I open the interface to the public, I'd like to test the code with some lists provided by other people. Does anyone have, or know of, any address lists that I would be able to use for this purpose? Obviously it needs to be license compatible with OSM (so please no lists generated from royal mail postcode data!) and ideally I'm after data sets containing at least: street address (house name / number optional) town / city postcode formatted as CSV or TSV. I'm specifically not after data containing the names of individuals. Has anyone got any suggestions, or is willing to offer any data? Even personal address books would be useful for testing... Why not do it the other way round? You know all the 2,500 or so prefixes, and there are only 26 x 26 * 100 combinations for the second part for each - about 200 million in all. If you feed these potential postcodes in quotes into Google UK over a long period with appropriate pauses so as not to get locked out, and look at the result for recognizable addresses (that's the tricky bit) as I'm doing in the Namefinder, you'd probably cover 75% of UK postcodes. Yes, its slow, but it's probably the biggest source there is. At one a second it would take about 6 years, but by enlisting 100 friends you'd do it in a month - less if it's possible to be more intelligent about it - for example, for the number part if there's no 14XX or 15XX I doubt there would be any 16s or above either, except for a few special cases. I'm curious about this. Data scraped via Google is still subject to the terms of the original page it references? Cheers Andy David ___ Talk-GB mailing list Talk-GB@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-gb No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.176 / Virus Database: 270.9.12/1821 - Release Date: 30/11/2008 5:53 PM ___ Talk-GB mailing list Talk-GB@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-gb
Re: [Talk-GB] Request for UK address lists for postcode extraction
Brian Quinion wrote: Sent: 01 December 2008 4:01 PM To: Andy Robinson (blackadder-lists) Cc: David Earl; talk-gb@openstreetmap.org Subject: Re: [Talk-GB] Request for UK address lists for postcode extraction Andy Robinson wrote: David Earl wrote: On 01/12/2008 14:11, Brian Quinion wrote: Has anyone got any suggestions, or is willing to offer any data? Even personal address books would be useful for testing... You know all the 2,500 or so prefixes, and there are only 26 x 26 * 100 combinations for the second part for each - about 200 million in all. If you feed these potential postcodes in quotes into Google UK over a long period with appropriate pauses so as not to get locked out, and look at the result for recognizable addresses (that's the tricky bit) as I'm doing in the Namefinder, you'd probably cover 75% of UK postcodes. I'm curious about this. Data scraped via Google is still subject to the terms of the original page it references? I looked into this and came to the conclusion that you could probably claim 'fair use' as long as you pulled each address from a different website. The trouble is that for most searches you end up on one of a small number of directory sites so doing any significant number is likely to end up as a database extraction. The results are also mostly limited to business addresses. Probably it would be possible to filter it so not too many requests went to any one site, but that still leaves the possibility that they used royal mails postcode finder (or similar) to find their original data. Across a large number of sites you could end up doing a database extraction from royal mail regardless. Address books and company mailing lists seemed like a preferable source and as long as individuals names are not included privacy shouldn't be an issue. I'd noted that too. Business directory listings (Yell, Thomson etc) or house price finders which are using copyright Land Registry data in the background. One source I am exploring is planning application listings produced by the local authority. Which is I think were you had headed? Cheers Andy ___ Talk-GB mailing list Talk-GB@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-gb
Re: [Talk-GB] Request for UK address lists for postcode extraction
Brian Quinion wrote: Sent: 01 December 2008 4:28 PM To: Andy Robinson (blackadder-lists) Cc: talk-gb@openstreetmap.org Subject: Re: [Talk-GB] Request for UK address lists for postcode extraction Andy Robinson wrote: I'd noted that too. Business directory listings (Yell, Thomson etc) or house price finders which are using copyright Land Registry data in the background. One source I am exploring is planning application listings produced by the local authority. Which is I think were you had headed? It's one of the options I hope to look at but my main idea was to get a tool that could do the extraction working and make it available to anyone who wants to provide data. Anyone with a mailing list or address book is a potential data source - I just hope to make it easy for them to submit it. I just need some test data that I didn't write to make sure I'm not making silly assumptions! :-) This any use? http://www.birmingham.gov.uk/GenerateContent?CONTENT_ITEM_ID=40870CONTENT_ ITEM_TYPE=0MENU_ID=13170 Cheers Andy -- Brian No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.176 / Virus Database: 270.9.12/1821 - Release Date: 30/11/2008 5:53 PM ___ Talk-GB mailing list Talk-GB@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-gb
Re: [Talk-GB] Request for UK address lists for postcode extraction
One source I am exploring is planning application listings produced by the local authority. Which is I think were you had headed? I'm not sure of the legal situation with planning data, but if things seem fine with that then you might be interested to know that the PlanningAlerts project have developed a number of screen scrapers for various local authorities: http://code.google.com/p/planningalerts/wiki/ExistingScrapers Cheers, Gregory ___ Talk-GB mailing list Talk-GB@openstreetmap.org http://lists.openstreetmap.org/listinfo/talk-gb