For councils there are other ids we could use, so I agree.
.
But in general for WDTK, there is no website other than Wikipedia with
anything approaching identifiers for the 3000-odd authorities we have
in there.

Francis

On Fri, Jun 19, 2009 at 01:20:48PM +0100, CountCulture wrote:
> Redirects are fine for viewing a webpage, but somewhat problematic as a
> canonical, immutable id that can be used to get data from a number of
> sources (which is what we're after, I reckon -- if I wanted the WDTK
> data on Cheshire West and Chester, for example, would I be able to get
> it via:
> 
>     * http://www.whatdotheyknow.com/body/Cheshire_West_and_Chester
>     * http://www.whatdotheyknow.com/body/West_Cheshire_and_Chester
>     * http://www.whatdotheyknow.com/body/City_of_Chester_and_West_Cheshire
> 
> Seems a lot of work from the developer point of view (ignoring probs
> caused by rogue edits).
> 
> Also while the redirects do provide a partial history, as I understand
> it are only a one-step history, i.e. though the wikipedia article on
> http://en.wikipedia.org/wiki/City_of_Chester_and_West_Cheshire redirects
> to Cheshire_West_and_Chester and not via West_Cheshire_and_Chester (I'm
> not saying that's a big prob, just that it's not a full history; it's
> also not a history of the official name changes, just of the wiki
> editing process).
> 
> All we're after here is a common code that doesn't change (while the
> local authority or other public body doesn't change) that various
> websites can support (with minimum coding) to provide the data without
> ambiguity. Wikipedia article URLs, much as we love them, doesn't really
> work in that respect IMHO.
> Cheers
> C
> 
> 
> 
> Francis Irving wrote:
> > Yes, I mean the article name, probably in the form it appears in the
> > URI.
> >
> > Although Wikipedia titles do change, they always provide a redirect.
> >
> > The nice thing about it, is that the redirects become part of the
> > structured information.
> >
> > Francis
> >
> > On Fri, Jun 19, 2009 at 12:02:49PM +0100, CountCulture wrote:
> >   
> >> Francis
> >> Think we should investigate Alex's suggestions of SNAC codes. Not sure  
> >> about Wikipedia ids -- do you mean uris, or do they have numerical ids  
> >> too; prefer numerical/poss alphanumerical unique ids rather than  
> >> strings, and Wikipedia page titles change too often to be canonical IMHO.
> >> Cheers
> >> C
> >>
> >>
> >> Francis Irving wrote:
> >>     
> >>> (copied to WhatDoTheyKnow team)
> >>>
> >>> Anyone here know about identifiers for local authorities?
> >>>
> >>> I'm inclined to use Wikipedia article ids, as that will extend to
> >>> other authorities as well.
> >>>
> >>> Francis
> >>>
> >>> On Thu, Jun 18, 2009 at 11:44:12AM +0100, CountCulture wrote:
> >>>   
> >>>       
> >>>> Francis
> >>>> Thought it might be useful if twfylocal could show status of WDTK   
> >>>> requests (total, recent, no answered, outstanding late etc), with 
> >>>> basic  details of requests (though prob makes sense to go to WDTK 
> >>>> site for full  details of request).
> >>>>
> >>>> Re id system, it's something I've been struggling with as everywhere  
> >>>> uses a different system, so at the moment each twfylocal council 
> >>>> record  stores the following ids/refs:
> >>>>
> >>>> :id (integer, twfy_local internal primary id. WON'T CHANGE)
> >>>> :name (string, as scraped from eGR, though with some minor edits)
> >>>> :wikipedia_url (string, as scraped from eGR, though have already 
> >>>> found  one mistake)
> >>>> :ons_url (string)
> >>>> :egr_id (integer, this is most useful as it gives links to loads of   
> >>>> other things -- e.g. various gov pages -- doesn't change AFAIK even 
> >>>> if  the authority name does)
> >>>> :wdtk_name (string, from scraping WDTK and trying to match against   
> >>>> shortened version of name -- successful about 80% of the time)
> >>>>
> >>>> Had a look at the WDTK code and I seem to remember the internal 
> >>>> primary  id is exposed in at least one place, but that it didn't help 
> >>>> as you  couldn't do queries by it. What we could really do with is a 
> >>>> canonical  id for each authority.
> >>>>
> >>>> FWIW you can use the eGR on twfylocal, though it adds an extra step 
> >>>> (if  you go to theyworkforyoulocal.com/councils.xml it returns all 
> >>>> the  councils together with their ids and the eGR ids. If you could 
> >>>> match  WDTK with eGR ids (for example) and make the match available   
> >>>> programmatically would have the beginnings of a makeshift common id.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>>
> >>>> Francis Irving wrote:
> >>>>     
> >>>>         
> >>>>> There are RSS feeds of latest responses, including quite fancy ones if
> >>>>> you use advanced search keywords. They only give extracts from the new
> >>>>> messages though. What exact information are you trying to get?
> >>>>>
> >>>>> There is no structured way to get status or similar out of the site.
> >>>>>
> >>>>> Finally, we could agree an id system for name matching. I'd quite like
> >>>>> in a way to mark every authority with, say, its identifier in
> >>>>> Wikipedia, to aid merging with other databases.
> >>>>>
> >>>>> What identifiers are you using in your system?
> >>>>>
> >>>>> Francis
> >>>>>
> >>>>> On Wed, Jun 17, 2009 at 03:05:26PM +0200, Tom Steinberg wrote:
> >>>>>         
> >>>>>           
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm afraid I don't know, but I've CCed the team who look after WDTK to 
> >>>>>> ask.
> >>>>>>
> >>>>>> Tom
> >>>>>>
> >>>>>> 2009/6/17 CountCulture <[email protected]>:
> >>>>>>             
> >>>>>>             
> >>>>>>> Tom
> >>>>>>> Follow up question. At the moment I've got a link to the What Do They 
> >>>>>>> Know
> >>>>>>> page for the council. Any probs with including more info from WDTK 
> >>>>>>> such as
> >>>>>>> status, and latest responses, and is there a good way to get that 
> >>>>>>> other than
> >>>>>>> scraping the data ( had a look at the code and there didn't really 
> >>>>>>> seem to
> >>>>>>> be)?
> >>>>>>> Cheers
> >>>>>>> C
> >>>>>>>
> >>>>>>> -------- Original Message --------
> >>>>>>>
> >>>>>>> Tom
> >>>>>>>
> >>>>>>> Digging deeper is actually where I'd intended to go first, but when I
> >>>>>>> started to explore some of the council websites I found that even 
> >>>>>>> shallow
> >>>>>>> data was problematic and reckoned I needed a API and structure that 
> >>>>>>> at the
> >>>>>>> very least could cope with those variants (and reuse the 
> >>>>>>> scrapers/parsers
> >>>>>>> once written) -- hence the proof-of-concept nature.
> >>>>>>>
> >>>>>>> However, now I've got the basics worked out (though there's still 
> >>>>>>> tweaking
> >>>>>>> and issues to be done there), delving deeper's the next step. In 
> >>>>>>> particular,
> >>>>>>> working out the best way of finding/storing/parsing council docs 
> >>>>>>> (which are
> >>>>>>> often unstructured PDFs, sometimes even just PDFs which are just 
> >>>>>>> scans), and
> >>>>>>> also working out an elegant way of linking with other data sources.
> >>>>>>>
> >>>>>>> Thanks for the kind words, I'll keep the list updated with major
> >>>>>>> developments, or you can always watch the github repository.
> >>>>>>>
> >>>>>>> Cheers
> >>>>>>> C
> >>>>>>>
> >>>>>>> Tom Steinberg wrote:
> >>>>>>>                 
> >>>>>>>               
> >>>>>>>> Hi there,
> >>>>>>>>
> >>>>>>>> Cool - great to see people hacking on councils, it's been something
> >>>>>>>> I've wanted to see for ages.
> >>>>>>>>
> >>>>>>>> I see you've gone straight for getting the councillors of several
> >>>>>>>> different councils, but I'd actually suggest going deeper rather than
> >>>>>>>> wider. Why not just dive deep into one council and see if you can get
> >>>>>>>> transcripts or other documents nicely scraped and parsed? I'd love to
> >>>>>>>> see at least a handful of councils in TheyWorkForYou proper by the 
> >>>>>>>> end
> >>>>>>>> of the year.
> >>>>>>>>
> >>>>>>>> Well done anyway!
> >>>>>>>>
> >>>>>>>> Tom
> >>>>>>>>
> >>>>>>>> 2009/6/16 CountCulture <[email protected]>:
> >>>>>>>>
> >>>>>>>>                     
> >>>>>>>>                 
> >>>>>>>>> Quick note about something I've been working on in my spare time:
> >>>>>>>>>
> >>>>>>>>> http://theyworkforyoulocal.com -- a small app to scrape and parse 
> >>>>>>>>> local
> >>>>>>>>> authority info.
> >>>>>>>>>
> >>>>>>>>> At the moment, it's barely more than a proof of concept, with only 
> >>>>>>>>> about
> >>>>>>>>> 20 or so councils parsed, and even then only current councillors,
> >>>>>>>>> committees, committee membership and forthcoming meetings are 
> >>>>>>>>> parsed.
> >>>>>>>>>
> >>>>>>>>> On the upside, it's fairly quick for me to add new parsers for 
> >>>>>>>>> councils
> >>>>>>>>> (and reuse ones already written if they use same CMS), there's an 
> >>>>>>>>> API
> >>>>>>>>> built in (basically just add .json or .xml to get the info as json 
> >>>>>>>>> or
> >>>>>>>>> XML), and there's lots of potential.
> >>>>>>>>>
> >>>>>>>>> Getting this far has also been an education in understanding what a
> >>>>>>>>> full-blown twfy_local might look like (in general there seems no 
> >>>>>>>>> way to
> >>>>>>>>> see how councillors voted, for example), the need for such a 
> >>>>>>>>> resource
> >>>>>>>>> (there's no publicly available central repository for council 
> >>>>>>>>> election
> >>>>>>>>> results, for example), and the sorry state of local authority 
> >>>>>>>>> websites
> >>>>>>>>> (just finding a list of councillors is a challenge on some, and 
> >>>>>>>>> don't
> >>>>>>>>> get me started on the HTML markup).
> >>>>>>>>>
> >>>>>>>>> Comments welcome. Code is at
> >>>>>>>>> http://github.com/CountCulture/twfy_local_parser/ (I'll probably 
> >>>>>>>>> GPL it
> >>>>>>>>> soon). Bug reports at
> >>>>>>>>> http://github.com/CountCulture/twfy_local_parser/issues and offers 
> >>>>>>>>> of
> >>>>>>>>> help to countculture at googlemail dot com.
> >>>>>>>>>
> >>>>>>>>> I'd especially be interested in hearing from anyone who's got any
> >>>>>>>>> knowledge about local authority CMSs (e.g. there seem to be several
> >>>>>>>>> different versions of Modern.Gov producing different URLs), or 
> >>>>>>>>> sources
> >>>>>>>>> for more data other than the local authority websites (e.g. eGR,
> >>>>>>>>> info4local).
> >>>>>>>>>
> >>>>>>>>> Cheers
> >>>>>>>>>
> >>>>>>>>> C
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Mailing list [email protected]
> >>>>>>>>> Archive, settings, or unsubscribe:
> >>>>>>>>>
> >>>>>>>>> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>                         
> >>>>>>>>>                   
> >>>>>>>>                     
> >>>>>>>>                 
> >>>>>>>                 
> >>>>>>>               
> >>>>>         
> >>>>>           
> >>>>     
> >>>>         
> >>>   
> >>>       
> >>     
> >
> >   
> 
> 
> 
> _______________________________________________
> Mailing list [email protected]
> Archive, settings, or unsubscribe:
> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
> 

_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

Reply via email to