Alex This looks useful (but complex). Any chance you could point me in the direction of which of the 15 spreadsheets (in the Excel file - don't have Access) contains the Local Authorities, and which is the primary key?
Cheers C Alex Skene wrote: > We should probably use SNAC codes as the identifier for local authorities > <<http://www.ons.gov.uk/about-statistics/geography/products/geog-products-area/snac/index.html>> > > Cheers > Alex > > 2009/6/19 Francis Irving <[email protected]>: > >> (copied to WhatDoTheyKnow team) >> >> Anyone here know about identifiers for local authorities? >> >> I'm inclined to use Wikipedia article ids, as that will extend to >> other authorities as well. >> >> Francis >> >> On Thu, Jun 18, 2009 at 11:44:12AM +0100, CountCulture wrote: >> >>> Francis >>> Thought it might be useful if twfylocal could show status of WDTK >>> requests (total, recent, no answered, outstanding late etc), with basic >>> details of requests (though prob makes sense to go to WDTK site for full >>> details of request). >>> >>> Re id system, it's something I've been struggling with as everywhere >>> uses a different system, so at the moment each twfylocal council record >>> stores the following ids/refs: >>> >>> :id (integer, twfy_local internal primary id. WON'T CHANGE) >>> :name (string, as scraped from eGR, though with some minor edits) >>> :wikipedia_url (string, as scraped from eGR, though have already found >>> one mistake) >>> :ons_url (string) >>> :egr_id (integer, this is most useful as it gives links to loads of >>> other things -- e.g. various gov pages -- doesn't change AFAIK even if >>> the authority name does) >>> :wdtk_name (string, from scraping WDTK and trying to match against >>> shortened version of name -- successful about 80% of the time) >>> >>> Had a look at the WDTK code and I seem to remember the internal primary >>> id is exposed in at least one place, but that it didn't help as you >>> couldn't do queries by it. What we could really do with is a canonical >>> id for each authority. >>> >>> FWIW you can use the eGR on twfylocal, though it adds an extra step (if >>> you go to theyworkforyoulocal.com/councils.xml it returns all the >>> councils together with their ids and the eGR ids. If you could match >>> WDTK with eGR ids (for example) and make the match available >>> programmatically would have the beginnings of a makeshift common id. >>> >>> Thoughts? >>> >>> >>> Francis Irving wrote: >>> >>>> There are RSS feeds of latest responses, including quite fancy ones if >>>> you use advanced search keywords. They only give extracts from the new >>>> messages though. What exact information are you trying to get? >>>> >>>> There is no structured way to get status or similar out of the site. >>>> >>>> Finally, we could agree an id system for name matching. I'd quite like >>>> in a way to mark every authority with, say, its identifier in >>>> Wikipedia, to aid merging with other databases. >>>> >>>> What identifiers are you using in your system? >>>> >>>> Francis >>>> >>>> On Wed, Jun 17, 2009 at 03:05:26PM +0200, Tom Steinberg wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> I'm afraid I don't know, but I've CCed the team who look after WDTK to >>>>> ask. >>>>> >>>>> Tom >>>>> >>>>> 2009/6/17 CountCulture <[email protected]>: >>>>> >>>>> >>>>>> Tom >>>>>> Follow up question. At the moment I've got a link to the What Do They >>>>>> Know >>>>>> page for the council. Any probs with including more info from WDTK such >>>>>> as >>>>>> status, and latest responses, and is there a good way to get that other >>>>>> than >>>>>> scraping the data ( had a look at the code and there didn't really seem >>>>>> to >>>>>> be)? >>>>>> Cheers >>>>>> C >>>>>> >>>>>> -------- Original Message -------- >>>>>> >>>>>> Tom >>>>>> >>>>>> Digging deeper is actually where I'd intended to go first, but when I >>>>>> started to explore some of the council websites I found that even shallow >>>>>> data was problematic and reckoned I needed a API and structure that at >>>>>> the >>>>>> very least could cope with those variants (and reuse the scrapers/parsers >>>>>> once written) -- hence the proof-of-concept nature. >>>>>> >>>>>> However, now I've got the basics worked out (though there's still >>>>>> tweaking >>>>>> and issues to be done there), delving deeper's the next step. In >>>>>> particular, >>>>>> working out the best way of finding/storing/parsing council docs (which >>>>>> are >>>>>> often unstructured PDFs, sometimes even just PDFs which are just scans), >>>>>> and >>>>>> also working out an elegant way of linking with other data sources. >>>>>> >>>>>> Thanks for the kind words, I'll keep the list updated with major >>>>>> developments, or you can always watch the github repository. >>>>>> >>>>>> Cheers >>>>>> C >>>>>> >>>>>> Tom Steinberg wrote: >>>>>> >>>>>> >>>>>>> Hi there, >>>>>>> >>>>>>> Cool - great to see people hacking on councils, it's been something >>>>>>> I've wanted to see for ages. >>>>>>> >>>>>>> I see you've gone straight for getting the councillors of several >>>>>>> different councils, but I'd actually suggest going deeper rather than >>>>>>> wider. Why not just dive deep into one council and see if you can get >>>>>>> transcripts or other documents nicely scraped and parsed? I'd love to >>>>>>> see at least a handful of councils in TheyWorkForYou proper by the end >>>>>>> of the year. >>>>>>> >>>>>>> Well done anyway! >>>>>>> >>>>>>> Tom >>>>>>> >>>>>>> 2009/6/16 CountCulture <[email protected]>: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Quick note about something I've been working on in my spare time: >>>>>>>> >>>>>>>> http://theyworkforyoulocal.com -- a small app to scrape and parse local >>>>>>>> authority info. >>>>>>>> >>>>>>>> At the moment, it's barely more than a proof of concept, with only >>>>>>>> about >>>>>>>> 20 or so councils parsed, and even then only current councillors, >>>>>>>> committees, committee membership and forthcoming meetings are parsed. >>>>>>>> >>>>>>>> On the upside, it's fairly quick for me to add new parsers for councils >>>>>>>> (and reuse ones already written if they use same CMS), there's an API >>>>>>>> built in (basically just add .json or .xml to get the info as json or >>>>>>>> XML), and there's lots of potential. >>>>>>>> >>>>>>>> Getting this far has also been an education in understanding what a >>>>>>>> full-blown twfy_local might look like (in general there seems no way to >>>>>>>> see how councillors voted, for example), the need for such a resource >>>>>>>> (there's no publicly available central repository for council election >>>>>>>> results, for example), and the sorry state of local authority websites >>>>>>>> (just finding a list of councillors is a challenge on some, and don't >>>>>>>> get me started on the HTML markup). >>>>>>>> >>>>>>>> Comments welcome. Code is at >>>>>>>> http://github.com/CountCulture/twfy_local_parser/ (I'll probably GPL it >>>>>>>> soon). Bug reports at >>>>>>>> http://github.com/CountCulture/twfy_local_parser/issues and offers of >>>>>>>> help to countculture at googlemail dot com. >>>>>>>> >>>>>>>> I'd especially be interested in hearing from anyone who's got any >>>>>>>> knowledge about local authority CMSs (e.g. there seem to be several >>>>>>>> different versions of Modern.Gov producing different URLs), or sources >>>>>>>> for more data other than the local authority websites (e.g. eGR, >>>>>>>> info4local). >>>>>>>> >>>>>>>> Cheers >>>>>>>> >>>>>>>> C >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Mailing list [email protected] >>>>>>>> Archive, settings, or unsubscribe: >>>>>>>> >>>>>>>> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>> > > _______________________________________________ Mailing list [email protected] Archive, settings, or unsubscribe: https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
