Tom Follow up question. At the moment I've got a link to the What Do They Know page for the council. Any probs with including more info from WDTK such as status, and latest responses, and is there a good way to get that other than scraping the data ( had a look at the code and there didn't really seem to be)? Cheers C
-------- Original Message -------- Tom Digging deeper is actually where I'd intended to go first, but when I started to explore some of the council websites I found that even shallow data was problematic and reckoned I needed a API and structure that at the very least could cope with those variants (and reuse the scrapers/parsers once written) -- hence the proof-of-concept nature. However, now I've got the basics worked out (though there's still tweaking and issues to be done there), delving deeper's the next step. In particular, working out the best way of finding/storing/parsing council docs (which are often unstructured PDFs, sometimes even just PDFs which are just scans), and also working out an elegant way of linking with other data sources. Thanks for the kind words, I'll keep the list updated with major developments, or you can always watch the github repository. Cheers C Tom Steinberg wrote: > Hi there, > > Cool - great to see people hacking on councils, it's been something > I've wanted to see for ages. > > I see you've gone straight for getting the councillors of several > different councils, but I'd actually suggest going deeper rather than > wider. Why not just dive deep into one council and see if you can get > transcripts or other documents nicely scraped and parsed? I'd love to > see at least a handful of councils in TheyWorkForYou proper by the end > of the year. > > Well done anyway! > > Tom > > 2009/6/16 CountCulture <[email protected]>: > >> Quick note about something I've been working on in my spare time: >> >> http://theyworkforyoulocal.com -- a small app to scrape and parse local >> authority info. >> >> At the moment, it's barely more than a proof of concept, with only about >> 20 or so councils parsed, and even then only current councillors, >> committees, committee membership and forthcoming meetings are parsed. >> >> On the upside, it's fairly quick for me to add new parsers for councils >> (and reuse ones already written if they use same CMS), there's an API >> built in (basically just add .json or .xml to get the info as json or >> XML), and there's lots of potential. >> >> Getting this far has also been an education in understanding what a >> full-blown twfy_local might look like (in general there seems no way to >> see how councillors voted, for example), the need for such a resource >> (there's no publicly available central repository for council election >> results, for example), and the sorry state of local authority websites >> (just finding a list of councillors is a challenge on some, and don't >> get me started on the HTML markup). >> >> Comments welcome. Code is at >> http://github.com/CountCulture/twfy_local_parser/ (I'll probably GPL it >> soon). Bug reports at >> http://github.com/CountCulture/twfy_local_parser/issues and offers of >> help to countculture at googlemail dot com. >> >> I'd especially be interested in hearing from anyone who's got any >> knowledge about local authority CMSs (e.g. there seem to be several >> different versions of Modern.Gov producing different URLs), or sources >> for more data other than the local authority websites (e.g. eGR, >> info4local). >> >> Cheers >> >> C >> >> _______________________________________________ >> Mailing list [email protected] >> Archive, settings, or unsubscribe: >> https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public >> >> > > _______________________________________________ Mailing list [email protected] Archive, settings, or unsubscribe: https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
