Francis
Think we should investigate Alex's suggestions of SNAC codes. Not sure
about Wikipedia ids -- do you mean uris, or do they have numerical ids
too; prefer numerical/poss alphanumerical unique ids rather than
strings, and Wikipedia page titles change too often to be canonical IMHO.
Cheers
C


Francis Irving wrote:
> (copied to WhatDoTheyKnow team)
>
> Anyone here know about identifiers for local authorities?
>
> I'm inclined to use Wikipedia article ids, as that will extend to
> other authorities as well.
>
> Francis
>
> On Thu, Jun 18, 2009 at 11:44:12AM +0100, CountCulture wrote:
>   
>> Francis
>> Thought it might be useful if twfylocal could show status of WDTK  
>> requests (total, recent, no answered, outstanding late etc), with basic  
>> details of requests (though prob makes sense to go to WDTK site for full  
>> details of request).
>>
>> Re id system, it's something I've been struggling with as everywhere  
>> uses a different system, so at the moment each twfylocal council record  
>> stores the following ids/refs:
>>
>> :id (integer, twfy_local internal primary id. WON'T CHANGE)
>> :name (string, as scraped from eGR, though with some minor edits)
>> :wikipedia_url (string, as scraped from eGR, though have already found  
>> one mistake)
>> :ons_url (string)
>> :egr_id (integer, this is most useful as it gives links to loads of  
>> other things -- e.g. various gov pages -- doesn't change AFAIK even if  
>> the authority name does)
>> :wdtk_name (string, from scraping WDTK and trying to match against  
>> shortened version of name -- successful about 80% of the time)
>>
>> Had a look at the WDTK code and I seem to remember the internal primary  
>> id is exposed in at least one place, but that it didn't help as you  
>> couldn't do queries by it. What we could really do with is a canonical  
>> id for each authority.
>>
>> FWIW you can use the eGR on twfylocal, though it adds an extra step (if  
>> you go to theyworkforyoulocal.com/councils.xml it returns all the  
>> councils together with their ids and the eGR ids. If you could match  
>> WDTK with eGR ids (for example) and make the match available  
>> programmatically would have the beginnings of a makeshift common id.
>>
>> Thoughts?
>>
>>
>> Francis Irving wrote:
>>     
>>> There are RSS feeds of latest responses, including quite fancy ones if
>>> you use advanced search keywords. They only give extracts from the new
>>> messages though. What exact information are you trying to get?
>>>
>>> There is no structured way to get status or similar out of the site.
>>>
>>> Finally, we could agree an id system for name matching. I'd quite like
>>> in a way to mark every authority with, say, its identifier in
>>> Wikipedia, to aid merging with other databases.
>>>
>>> What identifiers are you using in your system?
>>>
>>> Francis
>>>
>>> On Wed, Jun 17, 2009 at 03:05:26PM +0200, Tom Steinberg wrote:
>>>   
>>>       
>>>> Hi,
>>>>
>>>> I'm afraid I don't know, but I've CCed the team who look after WDTK to ask.
>>>>
>>>> Tom
>>>>
>>>> 2009/6/17 CountCulture <[email protected]>:
>>>>     
>>>>         
>>>>> Tom
>>>>> Follow up question. At the moment I've got a link to the What Do They Know
>>>>> page for the council. Any probs with including more info from WDTK such as
>>>>> status, and latest responses, and is there a good way to get that other 
>>>>> than
>>>>> scraping the data ( had a look at the code and there didn't really seem to
>>>>> be)?
>>>>> Cheers
>>>>> C
>>>>>
>>>>> -------- Original Message --------
>>>>>
>>>>> Tom
>>>>>
>>>>> Digging deeper is actually where I'd intended to go first, but when I
>>>>> started to explore some of the council websites I found that even shallow
>>>>> data was problematic and reckoned I needed a API and structure that at the
>>>>> very least could cope with those variants (and reuse the scrapers/parsers
>>>>> once written) -- hence the proof-of-concept nature.
>>>>>
>>>>> However, now I've got the basics worked out (though there's still tweaking
>>>>> and issues to be done there), delving deeper's the next step. In 
>>>>> particular,
>>>>> working out the best way of finding/storing/parsing council docs (which 
>>>>> are
>>>>> often unstructured PDFs, sometimes even just PDFs which are just scans), 
>>>>> and
>>>>> also working out an elegant way of linking with other data sources.
>>>>>
>>>>> Thanks for the kind words, I'll keep the list updated with major
>>>>> developments, or you can always watch the github repository.
>>>>>
>>>>> Cheers
>>>>> C
>>>>>
>>>>> Tom Steinberg wrote:
>>>>>       
>>>>>           
>>>>>> Hi there,
>>>>>>
>>>>>> Cool - great to see people hacking on councils, it's been something
>>>>>> I've wanted to see for ages.
>>>>>>
>>>>>> I see you've gone straight for getting the councillors of several
>>>>>> different councils, but I'd actually suggest going deeper rather than
>>>>>> wider. Why not just dive deep into one council and see if you can get
>>>>>> transcripts or other documents nicely scraped and parsed? I'd love to
>>>>>> see at least a handful of councils in TheyWorkForYou proper by the end
>>>>>> of the year.
>>>>>>
>>>>>> Well done anyway!
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>> 2009/6/16 CountCulture <[email protected]>:
>>>>>>
>>>>>>         
>>>>>>             
>>>>>>> Quick note about something I've been working on in my spare time:
>>>>>>>
>>>>>>> http://theyworkforyoulocal.com -- a small app to scrape and parse local
>>>>>>> authority info.
>>>>>>>
>>>>>>> At the moment, it's barely more than a proof of concept, with only about
>>>>>>> 20 or so councils parsed, and even then only current councillors,
>>>>>>> committees, committee membership and forthcoming meetings are parsed.
>>>>>>>
>>>>>>> On the upside, it's fairly quick for me to add new parsers for councils
>>>>>>> (and reuse ones already written if they use same CMS), there's an API
>>>>>>> built in (basically just add .json or .xml to get the info as json or
>>>>>>> XML), and there's lots of potential.
>>>>>>>
>>>>>>> Getting this far has also been an education in understanding what a
>>>>>>> full-blown twfy_local might look like (in general there seems no way to
>>>>>>> see how councillors voted, for example), the need for such a resource
>>>>>>> (there's no publicly available central repository for council election
>>>>>>> results, for example), and the sorry state of local authority websites
>>>>>>> (just finding a list of councillors is a challenge on some, and don't
>>>>>>> get me started on the HTML markup).
>>>>>>>
>>>>>>> Comments welcome. Code is at
>>>>>>> http://github.com/CountCulture/twfy_local_parser/ (I'll probably GPL it
>>>>>>> soon). Bug reports at
>>>>>>> http://github.com/CountCulture/twfy_local_parser/issues and offers of
>>>>>>> help to countculture at googlemail dot com.
>>>>>>>
>>>>>>> I'd especially be interested in hearing from anyone who's got any
>>>>>>> knowledge about local authority CMSs (e.g. there seem to be several
>>>>>>> different versions of Modern.Gov producing different URLs), or sources
>>>>>>> for more data other than the local authority websites (e.g. eGR,
>>>>>>> info4local).
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> C
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list [email protected]
>>>>>>> Archive, settings, or unsubscribe:
>>>>>>>
>>>>>>> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>               
>>>>>>         
>>>>>>             
>>>>>       
>>>>>           
>>>   
>>>       
>>     
>
>   



_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

Reply via email to