Re: [Talk-GB] Request for UK address lists for postcode extraction

2008-12-01 Thread Andy Robinson (blackadder-lists)
David Earl wrote:
Sent: 01 December 2008 3:10 PM
To: talk-gb@openstreetmap.org
Subject: Re: [Talk-GB] Request for UK address lists for postcode extraction

On 01/12/2008 14:11, Brian Quinion wrote:
 Hi,

 I'm currently doing some work trying to generate postcode location
 data for the UK using address lists and address lookup using OSM data
 to supplement NPE.  So far it seems to work quite well with the
 address lists that I have available to me (and coping quite well with
 ambiguous road names) but I'm limited in my data sources and most of
 the address data is fairly consistent in both format and quality.

 So, before I open the interface to the public, I'd like to test the
 code with some lists provided by other people.

 Does anyone have, or know of, any address lists that I would be able
 to use for this purpose?  Obviously it needs to be license compatible
 with OSM (so please no lists generated from royal mail postcode data!)
 and ideally I'm after data sets containing at least:

 street address (house name / number optional)
 town / city
 postcode

 formatted as CSV or TSV.  I'm specifically not after data containing
 the names of individuals.

 Has anyone got any suggestions, or is willing to offer any data?  Even
 personal address books would be useful for testing...

Why not do it the other way round?

You know all the 2,500 or so prefixes, and there are only 26 x 26 * 100
combinations for the second part for each - about 200 million in all. If
you feed these potential postcodes in quotes into Google UK over a long
period with appropriate pauses so as not to get locked out, and look at
the result for recognizable addresses (that's the tricky bit) as I'm
doing in the Namefinder, you'd probably cover 75% of UK postcodes.

Yes, its slow, but it's probably the biggest source there is. At one a
second it would take about 6 years, but by enlisting 100 friends you'd
do it in a month - less if it's possible to be more intelligent about it
- for example, for the number part if there's no 14XX or 15XX I doubt
there would be any 16s or above either, except for a few special cases.

I'm curious about this. Data scraped via Google is still subject to the
terms of the original page it references?

Cheers

Andy


David


___
Talk-GB mailing list
Talk-GB@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-gb

No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.176 / Virus Database: 270.9.12/1821 - Release Date: 30/11/2008
5:53 PM


___
Talk-GB mailing list
Talk-GB@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-gb


Re: [Talk-GB] Request for UK address lists for postcode extraction

2008-12-01 Thread Andy Robinson (blackadder-lists)
Brian Quinion wrote:
Sent: 01 December 2008 4:01 PM
To: Andy Robinson (blackadder-lists)
Cc: David Earl; talk-gb@openstreetmap.org
Subject: Re: [Talk-GB] Request for UK address lists for postcode extraction

Andy Robinson wrote:
 David Earl wrote:
On 01/12/2008 14:11, Brian Quinion wrote:
 Has anyone got any suggestions, or is willing to offer any data?  Even
 personal address books would be useful for testing...

You know all the 2,500 or so prefixes, and there are only 26 x 26 * 100
combinations for the second part for each - about 200 million in all. If
you feed these potential postcodes in quotes into Google UK over a long
period with appropriate pauses so as not to get locked out, and look at
the result for recognizable addresses (that's the tricky bit) as I'm
doing in the Namefinder, you'd probably cover 75% of UK postcodes.

 I'm curious about this. Data scraped via Google is still subject to the
 terms of the original page it references?

I looked into this and came to the conclusion that you could probably
claim 'fair use' as long as you pulled each address from a different
website.  The trouble is that for most searches you end up on one of a
small number of directory sites so doing any significant number is
likely to end up as a database extraction.  The results are also
mostly limited to business addresses.

Probably it would be possible to filter it so not too many requests
went to any one site, but that still leaves the possibility that they
used royal mails postcode finder (or similar) to find their original
data.  Across a large number of sites you could end up doing a
database extraction from royal mail regardless.

Address books and company mailing lists seemed like a preferable
source and as long as individuals names are not included privacy
shouldn't be an issue.


I'd noted that too. Business directory listings (Yell, Thomson etc) or house
price finders which are using copyright Land Registry data in the
background.

One source I am exploring is planning application listings produced by the
local authority. Which is I think were you had headed?

Cheers

Andy


___
Talk-GB mailing list
Talk-GB@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-gb


Re: [Talk-GB] Request for UK address lists for postcode extraction

2008-12-01 Thread Andy Robinson (blackadder-lists)
Brian Quinion wrote:
Sent: 01 December 2008 4:28 PM
To: Andy Robinson (blackadder-lists)
Cc: talk-gb@openstreetmap.org
Subject: Re: [Talk-GB] Request for UK address lists for postcode extraction

Andy Robinson wrote:
 I'd noted that too. Business directory listings (Yell, Thomson etc) or
house
 price finders which are using copyright Land Registry data in the
 background.

 One source I am exploring is planning application listings produced by
the
 local authority. Which is I think were you had headed?

It's one of the options I hope to look at but my main idea was to get
a tool that could do the extraction working and make it available to
anyone who wants to provide data.  Anyone with a mailing list or
address book is a potential data source - I just hope to make it easy
for them to submit it.

I just need some test data that I didn't write to make sure I'm not
making silly assumptions! :-)


This any use?

http://www.birmingham.gov.uk/GenerateContent?CONTENT_ITEM_ID=40870CONTENT_
ITEM_TYPE=0MENU_ID=13170

Cheers

Andy

--
 Brian

No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.176 / Virus Database: 270.9.12/1821 - Release Date: 30/11/2008
5:53 PM


___
Talk-GB mailing list
Talk-GB@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-gb


Re: [Talk-GB] Request for UK address lists for postcode extraction

2008-12-01 Thread Gregory Williams
 One source I am exploring is planning application listings produced by
 the
 local authority. Which is I think were you had headed?

I'm not sure of the legal situation with planning data, but if things
seem fine with that then you might be interested to know that the
PlanningAlerts project have developed a number of screen scrapers for
various local authorities:

http://code.google.com/p/planningalerts/wiki/ExistingScrapers

Cheers,

Gregory

___
Talk-GB mailing list
Talk-GB@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-gb