Do you have all local authorities in there, or do you just set up a 
record when there's a request for a new body that you've not come across 
before?

Also, what do you do when there's a change e.g. when an authority is 
split or merged, or when a govt department is partly renamed and partly 
subsumed -- create a new record (with related to links), or rename the 
old one. If it's the former I don't see a reason why we couldn't use 
your primary IDs (though is problematic if it's the latter).

However, a simpler solution could be either be for publc bodies in the 
WDTK DB to have a :snac_code field (or similar) and then you could call 
them with a url of (something like):

http://www.whatdotheyknow.com/body?snac_code=AB23


and do something like:

@public_body = PublicBody.find_by_url_name_with_historic(params[:url_name]) || 
                  PublicBody.find_by_snac_code(params[:snac_code])


or alternatively have a :common_uid field with a format of 
la_[snac_code] which would allow you to use other common uids for other 
public bodies as and when you see fit. I'd have no problem prepending 
'la_' to wdtk requests and wdtk urls

Cheers
C
p.s. By the way, I'm guessing the govt asset register (can't remember 
what it's called of the top of my head) doesn't have a central record of 
current and past public bodies

Francis Irving wrote:
> For councils there are other ids we could use, so I agree.
> .
> But in general for WDTK, there is no website other than Wikipedia with
> anything approaching identifiers for the 3000-odd authorities we have
> in there.
>
> Francis
>
> On Fri, Jun 19, 2009 at 01:20:48PM +0100, CountCulture wrote:
>   
>> Redirects are fine for viewing a webpage, but somewhat problematic as a
>> canonical, immutable id that can be used to get data from a number of
>> sources (which is what we're after, I reckon -- if I wanted the WDTK
>> data on Cheshire West and Chester, for example, would I be able to get
>> it via:
>>
>>     * http://www.whatdotheyknow.com/body/Cheshire_West_and_Chester
>>     * http://www.whatdotheyknow.com/body/West_Cheshire_and_Chester
>>     * http://www.whatdotheyknow.com/body/City_of_Chester_and_West_Cheshire
>>
>> Seems a lot of work from the developer point of view (ignoring probs
>> caused by rogue edits).
>>
>> Also while the redirects do provide a partial history, as I understand
>> it are only a one-step history, i.e. though the wikipedia article on
>> http://en.wikipedia.org/wiki/City_of_Chester_and_West_Cheshire redirects
>> to Cheshire_West_and_Chester and not via West_Cheshire_and_Chester (I'm
>> not saying that's a big prob, just that it's not a full history; it's
>> also not a history of the official name changes, just of the wiki
>> editing process).
>>
>> All we're after here is a common code that doesn't change (while the
>> local authority or other public body doesn't change) that various
>> websites can support (with minimum coding) to provide the data without
>> ambiguity. Wikipedia article URLs, much as we love them, doesn't really
>> work in that respect IMHO.
>> Cheers
>> C
>>
>>
>>
>> Francis Irving wrote:
>>     
>>> Yes, I mean the article name, probably in the form it appears in the
>>> URI.
>>>
>>> Although Wikipedia titles do change, they always provide a redirect.
>>>
>>> The nice thing about it, is that the redirects become part of the
>>> structured information.
>>>
>>> Francis
>>>
>>> On Fri, Jun 19, 2009 at 12:02:49PM +0100, CountCulture wrote:
>>>   
>>>       
>>>> Francis
>>>> Think we should investigate Alex's suggestions of SNAC codes. Not sure  
>>>> about Wikipedia ids -- do you mean uris, or do they have numerical ids  
>>>> too; prefer numerical/poss alphanumerical unique ids rather than  
>>>> strings, and Wikipedia page titles change too often to be canonical IMHO.
>>>> Cheers
>>>> C
>>>>
>>>>
>>>> Francis Irving wrote:
>>>>     
>>>>         
>>>>> (copied to WhatDoTheyKnow team)
>>>>>
>>>>> Anyone here know about identifiers for local authorities?
>>>>>
>>>>> I'm inclined to use Wikipedia article ids, as that will extend to
>>>>> other authorities as well.
>>>>>
>>>>> Francis
>>>>>
>>>>> On Thu, Jun 18, 2009 at 11:44:12AM +0100, CountCulture wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> Francis
>>>>>> Thought it might be useful if twfylocal could show status of WDTK   
>>>>>> requests (total, recent, no answered, outstanding late etc), with 
>>>>>> basic  details of requests (though prob makes sense to go to WDTK 
>>>>>> site for full  details of request).
>>>>>>
>>>>>> Re id system, it's something I've been struggling with as everywhere  
>>>>>> uses a different system, so at the moment each twfylocal council 
>>>>>> record  stores the following ids/refs:
>>>>>>
>>>>>> :id (integer, twfy_local internal primary id. WON'T CHANGE)
>>>>>> :name (string, as scraped from eGR, though with some minor edits)
>>>>>> :wikipedia_url (string, as scraped from eGR, though have already 
>>>>>> found  one mistake)
>>>>>> :ons_url (string)
>>>>>> :egr_id (integer, this is most useful as it gives links to loads of   
>>>>>> other things -- e.g. various gov pages -- doesn't change AFAIK even 
>>>>>> if  the authority name does)
>>>>>> :wdtk_name (string, from scraping WDTK and trying to match against   
>>>>>> shortened version of name -- successful about 80% of the time)
>>>>>>
>>>>>> Had a look at the WDTK code and I seem to remember the internal 
>>>>>> primary  id is exposed in at least one place, but that it didn't help 
>>>>>> as you  couldn't do queries by it. What we could really do with is a 
>>>>>> canonical  id for each authority.
>>>>>>
>>>>>> FWIW you can use the eGR on twfylocal, though it adds an extra step 
>>>>>> (if  you go to theyworkforyoulocal.com/councils.xml it returns all 
>>>>>> the  councils together with their ids and the eGR ids. If you could 
>>>>>> match  WDTK with eGR ids (for example) and make the match available   
>>>>>> programmatically would have the beginnings of a makeshift common id.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>>
>>>>>> Francis Irving wrote:
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> There are RSS feeds of latest responses, including quite fancy ones if
>>>>>>> you use advanced search keywords. They only give extracts from the new
>>>>>>> messages though. What exact information are you trying to get?
>>>>>>>
>>>>>>> There is no structured way to get status or similar out of the site.
>>>>>>>
>>>>>>> Finally, we could agree an id system for name matching. I'd quite like
>>>>>>> in a way to mark every authority with, say, its identifier in
>>>>>>> Wikipedia, to aid merging with other databases.
>>>>>>>
>>>>>>> What identifiers are you using in your system?
>>>>>>>
>>>>>>> Francis
>>>>>>>
>>>>>>> On Wed, Jun 17, 2009 at 03:05:26PM +0200, Tom Steinberg wrote:
>>>>>>>         
>>>>>>>           
>>>>>>>               
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm afraid I don't know, but I've CCed the team who look after WDTK to 
>>>>>>>> ask.
>>>>>>>>
>>>>>>>> Tom
>>>>>>>>
>>>>>>>> 2009/6/17 CountCulture <[email protected]>:
>>>>>>>>             
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> Tom
>>>>>>>>> Follow up question. At the moment I've got a link to the What Do They 
>>>>>>>>> Know
>>>>>>>>> page for the council. Any probs with including more info from WDTK 
>>>>>>>>> such as
>>>>>>>>> status, and latest responses, and is there a good way to get that 
>>>>>>>>> other than
>>>>>>>>> scraping the data ( had a look at the code and there didn't really 
>>>>>>>>> seem to
>>>>>>>>> be)?
>>>>>>>>> Cheers
>>>>>>>>> C
>>>>>>>>>
>>>>>>>>> -------- Original Message --------
>>>>>>>>>
>>>>>>>>> Tom
>>>>>>>>>
>>>>>>>>> Digging deeper is actually where I'd intended to go first, but when I
>>>>>>>>> started to explore some of the council websites I found that even 
>>>>>>>>> shallow
>>>>>>>>> data was problematic and reckoned I needed a API and structure that 
>>>>>>>>> at the
>>>>>>>>> very least could cope with those variants (and reuse the 
>>>>>>>>> scrapers/parsers
>>>>>>>>> once written) -- hence the proof-of-concept nature.
>>>>>>>>>
>>>>>>>>> However, now I've got the basics worked out (though there's still 
>>>>>>>>> tweaking
>>>>>>>>> and issues to be done there), delving deeper's the next step. In 
>>>>>>>>> particular,
>>>>>>>>> working out the best way of finding/storing/parsing council docs 
>>>>>>>>> (which are
>>>>>>>>> often unstructured PDFs, sometimes even just PDFs which are just 
>>>>>>>>> scans), and
>>>>>>>>> also working out an elegant way of linking with other data sources.
>>>>>>>>>
>>>>>>>>> Thanks for the kind words, I'll keep the list updated with major
>>>>>>>>> developments, or you can always watch the github repository.
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>> C
>>>>>>>>>
>>>>>>>>> Tom Steinberg wrote:
>>>>>>>>>                 
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>>>> Hi there,
>>>>>>>>>>
>>>>>>>>>> Cool - great to see people hacking on councils, it's been something
>>>>>>>>>> I've wanted to see for ages.
>>>>>>>>>>
>>>>>>>>>> I see you've gone straight for getting the councillors of several
>>>>>>>>>> different councils, but I'd actually suggest going deeper rather than
>>>>>>>>>> wider. Why not just dive deep into one council and see if you can get
>>>>>>>>>> transcripts or other documents nicely scraped and parsed? I'd love to
>>>>>>>>>> see at least a handful of councils in TheyWorkForYou proper by the 
>>>>>>>>>> end
>>>>>>>>>> of the year.
>>>>>>>>>>
>>>>>>>>>> Well done anyway!
>>>>>>>>>>
>>>>>>>>>> Tom
>>>>>>>>>>
>>>>>>>>>> 2009/6/16 CountCulture <[email protected]>:
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>>>>> Quick note about something I've been working on in my spare time:
>>>>>>>>>>>
>>>>>>>>>>> http://theyworkforyoulocal.com -- a small app to scrape and parse 
>>>>>>>>>>> local
>>>>>>>>>>> authority info.
>>>>>>>>>>>
>>>>>>>>>>> At the moment, it's barely more than a proof of concept, with only 
>>>>>>>>>>> about
>>>>>>>>>>> 20 or so councils parsed, and even then only current councillors,
>>>>>>>>>>> committees, committee membership and forthcoming meetings are 
>>>>>>>>>>> parsed.
>>>>>>>>>>>
>>>>>>>>>>> On the upside, it's fairly quick for me to add new parsers for 
>>>>>>>>>>> councils
>>>>>>>>>>> (and reuse ones already written if they use same CMS), there's an 
>>>>>>>>>>> API
>>>>>>>>>>> built in (basically just add .json or .xml to get the info as json 
>>>>>>>>>>> or
>>>>>>>>>>> XML), and there's lots of potential.
>>>>>>>>>>>
>>>>>>>>>>> Getting this far has also been an education in understanding what a
>>>>>>>>>>> full-blown twfy_local might look like (in general there seems no 
>>>>>>>>>>> way to
>>>>>>>>>>> see how councillors voted, for example), the need for such a 
>>>>>>>>>>> resource
>>>>>>>>>>> (there's no publicly available central repository for council 
>>>>>>>>>>> election
>>>>>>>>>>> results, for example), and the sorry state of local authority 
>>>>>>>>>>> websites
>>>>>>>>>>> (just finding a list of councillors is a challenge on some, and 
>>>>>>>>>>> don't
>>>>>>>>>>> get me started on the HTML markup).
>>>>>>>>>>>
>>>>>>>>>>> Comments welcome. Code is at
>>>>>>>>>>> http://github.com/CountCulture/twfy_local_parser/ (I'll probably 
>>>>>>>>>>> GPL it
>>>>>>>>>>> soon). Bug reports at
>>>>>>>>>>> http://github.com/CountCulture/twfy_local_parser/issues and offers 
>>>>>>>>>>> of
>>>>>>>>>>> help to countculture at googlemail dot com.
>>>>>>>>>>>
>>>>>>>>>>> I'd especially be interested in hearing from anyone who's got any
>>>>>>>>>>> knowledge about local authority CMSs (e.g. there seem to be several
>>>>>>>>>>> different versions of Modern.Gov producing different URLs), or 
>>>>>>>>>>> sources
>>>>>>>>>>> for more data other than the local authority websites (e.g. eGR,
>>>>>>>>>>> info4local).
>>>>>>>>>>>
>>>>>>>>>>> Cheers
>>>>>>>>>>>
>>>>>>>>>>> C
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Mailing list [email protected]
>>>>>>>>>>> Archive, settings, or unsubscribe:
>>>>>>>>>>>
>>>>>>>>>>> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                         
>>>>>>>>>>>                   
>>>>>>>>>>>                       
>>>>>>>>>>                     
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>>>                 
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>         
>>>>>>>           
>>>>>>>               
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>   
>>>>>       
>>>>>           
>>>>     
>>>>         
>>>   
>>>       
>>
>> _______________________________________________
>> Mailing list [email protected]
>> Archive, settings, or unsubscribe:
>> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
>>
>>     
>
>   


_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

Reply via email to