Tom
Did you get any feedback from the WDTK team? What I'm really hoping we
can move towards is a situation where the WDTK data can be accessed in a
machine usable form. Obviously lots of ways this could happen (RDFa
etc), but the simplest (i.e. would just require a couple of lines of
code, given WDTK is written in Rails) would be an XML representation of
the data *without pagination* (it should be as easy as a
respond_to_block and @requests.to_xml).
As you know, having to paginate adds an extra layer of complexity when
scraping and the RSS data, as well as being only the most recent records
doesn't contain much of the key data - e.g. status of request.
Thanks again
C
Tom Steinberg wrote:
Hi,
I'm afraid I don't know, but I've CCed the team who look after WDTK to ask.
Tom
2009/6/17 CountCulture <[email protected]>:
Tom
Follow up question. At the moment I've got a link to the What Do They Know
page for the council. Any probs with including more info from WDTK such as
status, and latest responses, and is there a good way to get that other than
scraping the data ( had a look at the code and there didn't really seem to
be)?
Cheers
C
-------- Original Message --------
Tom
Digging deeper is actually where I'd intended to go first, but when I
started to explore some of the council websites I found that even shallow
data was problematic and reckoned I needed a API and structure that at the
very least could cope with those variants (and reuse the scrapers/parsers
once written) -- hence the proof-of-concept nature.
However, now I've got the basics worked out (though there's still tweaking
and issues to be done there), delving deeper's the next step. In particular,
working out the best way of finding/storing/parsing council docs (which are
often unstructured PDFs, sometimes even just PDFs which are just scans), and
also working out an elegant way of linking with other data sources.
Thanks for the kind words, I'll keep the list updated with major
developments, or you can always watch the github repository.
Cheers
C
Tom Steinberg wrote:
Hi there,
Cool - great to see people hacking on councils, it's been something
I've wanted to see for ages.
I see you've gone straight for getting the councillors of several
different councils, but I'd actually suggest going deeper rather than
wider. Why not just dive deep into one council and see if you can get
transcripts or other documents nicely scraped and parsed? I'd love to
see at least a handful of councils in TheyWorkForYou proper by the end
of the year.
Well done anyway!
Tom
2009/6/16 CountCulture <[email protected]>:
Quick note about something I've been working on in my spare time:
http://theyworkforyoulocal.com -- a small app to scrape and parse local
authority info.
At the moment, it's barely more than a proof of concept, with only about
20 or so councils parsed, and even then only current councillors,
committees, committee membership and forthcoming meetings are parsed.
On the upside, it's fairly quick for me to add new parsers for councils
(and reuse ones already written if they use same CMS), there's an API
built in (basically just add .json or .xml to get the info as json or
XML), and there's lots of potential.
Getting this far has also been an education in understanding what a
full-blown twfy_local might look like (in general there seems no way to
see how councillors voted, for example), the need for such a resource
(there's no publicly available central repository for council election
results, for example), and the sorry state of local authority websites
(just finding a list of councillors is a challenge on some, and don't
get me started on the HTML markup).
Comments welcome. Code is at
http://github.com/CountCulture/twfy_local_parser/ (I'll probably GPL it
soon). Bug reports at
http://github.com/CountCulture/twfy_local_parser/issues and offers of
help to countculture at googlemail dot com.
I'd especially be interested in hearing from anyone who's got any
knowledge about local authority CMSs (e.g. there seem to be several
different versions of Modern.Gov producing different URLs), or sources
for more data other than the local authority websites (e.g. eGR,
info4local).
Cheers
C
_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public