Google

stuart yeates Mon, 27 Aug 2012 15:32:30 -0700

These have to be named graphs, or at least collections of triples whichcan be processed through workflows as a single unit.


In terms of LD there version needs to be defined in terms of:

(a) synchronisation with the non-bibliographic real world (i.e. DatasetZ version X was released at time Y)

(b) correction/augmentation of other datasets (i.e Dataset F version Gcontains triples augmenting Dataset H versions A, B, C and D)

(c) mapping between datasets (i.e. Dataset I contains triples mappingbetween Dataset J version K and Dataset L version M (and visa-versa))

Note that a 'Dataset' here could be a bibliographic dataset (records ofworks, etc), a classification dataset (a version of the Dewey DecimalScheme, a version of the Māori Subject Headings, a version of DublinCore Scheme, etc), a dataset of real-world entities to do authoritycontrol against (a dbpedia dump, an organisational structure in aninstitution, etc), or some arbitrary mapping between some arbitrarycombination of these.

Most of these are going to be managed and generated using currentsystems with processes that involve periodic dumps (or drops) of data(the dbpedia drops of wikipedia data are a good model here). git makeslittle sense for this kind of data.

github is most likely to be useful for smaller niche collaborativecollections (probably no more than a million triples) mapping betweenthe larger collections, and scripts for integrating the collections intoa sane whole.


cheers
stuart

On 28/08/12 08:36, Karen Coyle wrote:

Ed, Corey -

I also assumed that Ed wasn't suggesting that we literally use github as
our platform, but I do want to remind folks how far we are from having
"people friendly" versioning software -- at least, none that I have seen
has felt "intuitive." The features of git are great, and people have
built interfaces to it, but as Galen's question brings forth, the very
*idea* of versioning doesn't exist in library data processing, even
though having central-system based versions of MARC records (with a
single time line) is at least conceptually simple.

Therefore it seems to me that first we have to define what a version
would be, both in terms of data but also in terms of the mind set and
work flow of the cataloging process. How will people *understand*
versions in the context of their work? What do they need in order to
evaluate different versions? And that leads to my second question: what
is a version in LD space? Triples are just triples - you can add them or
delete them but I don't know of a way that you can version them, since
each has an independent T-space existence. So, are we talking about
named graphs?

I think this should be a high priority activity around the "new
bibliographic framework" planning because, as we have seen with MARC,
the idea of versioning needs to be part of the very design or it won't
happen.

kc

On 8/27/12 11:20 AM, Ed Summers wrote:

On Mon, Aug 27, 2012 at 1:33 PM, Corey A Harper <corey.har...@nyu.edu>
wrote:

I think there's a useful distinction here. Ed can correct me if I'm
wrong, but I suspect he was not actually suggesting that Git itself be
the user-interface to a github-for-data type service, but rather that
such a service can be built *on top* of an infrastructure component
like GitHub.

Yes, I wasn't saying that we could just plonk our data into Github,
and pat ourselves on the back for a good days work :-) I guess I was
stating the obvious: technologies like Git have made once hard
problems like decentralized version control much, much easier...and
there might be some giants shoulders to stand on.

//Ed



--
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/

Re: [CODE4LIB] Corrections to Worldcat/Hathi/Google

Reply via email to