These have to be named graphs, or at least collections of triples which can be processed through workflows as a single unit.

In terms of LD there version needs to be defined in terms of:

(a) synchronisation with the non-bibliographic real world (i.e. Dataset Z version X was released at time Y)

(b) correction/augmentation of other datasets (i.e Dataset F version G contains triples augmenting Dataset H versions A, B, C and D)

(c) mapping between datasets (i.e. Dataset I contains triples mapping between Dataset J version K and Dataset L version M (and visa-versa))

Note that a 'Dataset' here could be a bibliographic dataset (records of works, etc), a classification dataset (a version of the Dewey Decimal Scheme, a version of the Māori Subject Headings, a version of Dublin Core Scheme, etc), a dataset of real-world entities to do authority control against (a dbpedia dump, an organisational structure in an institution, etc), or some arbitrary mapping between some arbitrary combination of these.

Most of these are going to be managed and generated using current systems with processes that involve periodic dumps (or drops) of data (the dbpedia drops of wikipedia data are a good model here). git makes little sense for this kind of data.

github is most likely to be useful for smaller niche collaborative collections (probably no more than a million triples) mapping between the larger collections, and scripts for integrating the collections into a sane whole.

cheers
stuart

On 28/08/12 08:36, Karen Coyle wrote:
Ed, Corey -

I also assumed that Ed wasn't suggesting that we literally use github as
our platform, but I do want to remind folks how far we are from having
"people friendly" versioning software -- at least, none that I have seen
has felt "intuitive." The features of git are great, and people have
built interfaces to it, but as Galen's question brings forth, the very
*idea* of versioning doesn't exist in library data processing, even
though having central-system based versions of MARC records (with a
single time line) is at least conceptually simple.

Therefore it seems to me that first we have to define what a version
would be, both in terms of data but also in terms of the mind set and
work flow of the cataloging process. How will people *understand*
versions in the context of their work? What do they need in order to
evaluate different versions? And that leads to my second question: what
is a version in LD space? Triples are just triples - you can add them or
delete them but I don't know of a way that you can version them, since
each has an independent T-space existence. So, are we talking about
named graphs?

I think this should be a high priority activity around the "new
bibliographic framework" planning because, as we have seen with MARC,
the idea of versioning needs to be part of the very design or it won't
happen.

kc

On 8/27/12 11:20 AM, Ed Summers wrote:
On Mon, Aug 27, 2012 at 1:33 PM, Corey A Harper <corey.har...@nyu.edu>
wrote:
I think there's a useful distinction here. Ed can correct me if I'm
wrong, but I suspect he was not actually suggesting that Git itself be
the user-interface to a github-for-data type service, but rather that
such a service can be built *on top* of an infrastructure component
like GitHub.
Yes, I wasn't saying that we could just plonk our data into Github,
and pat ourselves on the back for a good days work :-) I guess I was
stating the obvious: technologies like Git have made once hard
problems like decentralized version control much, much easier...and
there might be some giants shoulders to stand on.

//Ed



--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/

Reply via email to