Thanks
Simon, that’s really useful.
I’ve already made an (admittedly cack-handed) effort to RDFa-ise our
council member’s pages, so this will be a big help:
Cheers
-----Original Message-----
From:
[email protected]
[mailto:[email protected]] On Behalf Of Simon Gibbs
Sent: 04 July 2009
11:59
To:
[email protected]; mySociety public, general purpose discussion list
Subject: Re:
[mySociety:public]
Putting Government Data online
CountCulture
wrote:
I'm the
dev behind
TheyWorkForYouLocal.com
I've used microformats before, so am fairly comfortable with that, but
have
been thinking about using RDFa for this project -- partly as a learning
experience, and partly because (based on what little I know about RDFa)
I'm
thinking it might make more sense as only a fraction of the data falls
into
microformat-type,
I'm
thinking that public
data like this may already have some sort of RDFa schema (if that's the
right
expression0.
If you can point me in the right direction for RDFa stuff, that'd be
great.
Hi again
I've put some demo pages online, try out the following URLs. I haven't
tackled
a minutes page, and the copies I took pre-date happiness stats, which
is a bit
of a shame.
http://cantorva.com/2009NS/twfyl-lod-demo/councils/45.htm
http://cantorva.com/2009NS/twfyl-lod-demo/members/1443.htm
http://cantorva.com/2009NS/twfyl-lod-demo/committees/771.htm
http://cantorva.com/2009NS/twfyl-lod-demo/meetings/4688.htm
http://cantorva.com/2009NS/twfyl-lod-demo/meetings-qm-council-id-eq-45.htm
I blogged some guidance about how to actually see the data:
http://cantorva.com/blog/2009/07/01/hints-on-browsing-embedded-rdfa-data-as-data/
The vocabulary is mostly FOAF, with Dublin Core, iCal and a bit of core
RDF
stuff. There are a few places where I couldn't locate a term, so I
invented some
and documented them in (also in RDFa):
http://cantorva.com/2009NS/twfyl-lod-demo/vocab
As well as keeping focused on the goal of creating an API for Linked
Data
hackers rather than fodder for SEO purposes (so no Google RDFa) - I
made a few
assumptions and judgement calls as I went along:
Provenance is important to you
-
the clue was you put last modified dates in the XML as well as the
page, and
gave your pseudonym and homepage on all the pages. The new XML related
to
happiness etc goes into more depth on provenance so this is borne out.
This
lead to...
The pages and the entities are
different
things - e.g. CountCulture did not make Brighton and Hove
City
Council he made a page about it. This meant adding #disambiguator to
the end of
each URL when talking about councils, committees etc. and leaving it
plain when
talking about pages. Logic nerds will love you for this since Document
!=
Person and some tools barf when data implies otherwise. This turned
into a bit
of a pain, and that is optional pain, but frankly Document!=Person
appeals
intuitively.
Its OK to talk about more than
one entity
per page- if its useful for a user to see something then
some
application may also want it. The meeting example is a good one for
talking
about just about everything else. Even if there is better data on other
pages I
marked up what was there. This should mean fewer HTTP GETs for some
apps at the
expense of a little bloat, and makes for additional machine readable
links
between entities.
You don't want to change your
markup
- its possible to put more data into some pages, notably the name of
councils
on meeting pages, but I left the mark up mostly as it was. There are
lots of
extra spans and one extra div, then the actual RDFa attributes. There
may have
been some mangling done by Firefox as I saved out the pages, so if
something is
changed that makes no sense ask and I can confirm why it changed.
You are not going to want two
RDF vocabularies
- so I made a few concessions to re-use by being
deliberately vague
in places. e.g. the vocab for "committee" does not link a council to
a committee it actually links Organization and Group (from FOAF). To be
honest,
I don't think many people will use (or notice) the machine readable
vocabulary
anyway so I tried not to over think it or research every option or
entity,
notably not all classes of local authority are documented.
Tabulator usability is also
important
- I put a bit of effort in to make sure Tabulator displayed the data as
nicely
as possible. This means the #disambiguator bits are actually structured
and
means extra rdfs:label properties on entities. I figure if Tabulator
uses that
stuff other apps will too, plus you don't want to look at horrid
presentations.
OK that's it, that pretty much tells you what is there and why. Let me
know
your thoughts.
Simon
This e-mail and any attachment(s), is confidential and may be
legally privileged. It is intended solely for the addressee. If you are
not the addressee, dissemination, copying or use of this e-mail or any
of its content is prohibited and may be unlawful. If you are not the
intended recipient please inform the sender immediately and destroy the
e-mail, any attachment(s) and any copies.
All liability for viruses is excluded to the fullest extent permitted
by law. It is your responsibility to scan or otherwise check this email
and any attachment(s).
Unless otherwise stated (i) views expressed in this message are those
of the individual sender (ii) no contract may be construed by this
e-mail.
Emails may be monitored and you are taken to consent to this
monitoring.