Google

Ed Summers Sun, 26 Aug 2012 18:37:03 -0700

Thanks for sharing this bit of detective work. I noticed something
similar fairly recently myself [1], but didn't discover as plausible
of a scenario for what had happened as you did. I imagine others have
noticed this network effect before as well.

On Tue, Aug 21, 2012 at 11:42 AM, Lars Aronsson <[email protected]> wrote:
> And sure enough, there it is,
> http://clio.cul.columbia.edu:7018/vwebv/holdingsInfo?bibId=1439352
> But will my error report to Worldcat find its way back
> to CLIO? Or if I report the error to Columbia University,
> will the correction propagate to Google, Hathi and Worldcat?
> (Columbia asks me for a student ID when I want to give
> feedback, so that removes this option for me.)

I realize this probably will sound flippant (or overly grandiose), but
innovating solutions to this problem, where there isn't necessarily
one metadata master that everyone is slaved to seems to be one of the
more important and interesting problems that our sector faces.

When Columbia University can become the source of a bibliographic
record for Google Books, HathiTrust and OpenLibrary, etc how does this
change the hub and spoke workflows (with OCLC as the hub) that we are
more familiar with? I think this topic is what's at the heart of the
discussions about a "github-for-data" [2,3], since decentralized
version control systems [4] allow for the evolution of more organic,
push/pull, multimaster workflows...and platforms like Github make them
socially feasible, easy and fun.

I also think Linked Library Data, where bibliographic descriptions are
REST enabled Web resources identified with URLs, and patterns such as
webhooks [5] make it easy to trigger update events could be part of an
answer. Feed technologies like Atom, RSS and the work being done on
ResourceSync also seem important technologies for us to use to allow
people to poll for changes [6]. And being able to say where you have
obtained data from, possibly using something like the W3C Provenance
vocabulary [7] also seems like an important part of the puzzle.

I'm sure there are other (and perhaps better) creative analogies or
tools that could help solve this problem. I think you're probably
right that we are starting to see the errors more now that more
library data is becoming part of the visible Web via projects like
GoogleBooks, HathiTrust, OpenLibrary and other enterprising libraries
that design their catalogs to be crawlable and indexable by search
engines.

But I think it's more fun to think about (and hack on) what grassroots
things we could be doing to help these new bibliographic data
workflows to grow and flourish than to get piled under by the errors,
and a sense of futility...

Or it might make for a good article or dissertation topic :-)

//Ed

[1] http://inkdroid.org/journal/2011/12/25/genealogy-of-a-typo/
[2] http://www.informationdiet.com/blog/read/we-need-a-github-for-data
[3] http://sunlightlabs.com/blog/2010/we-dont-need-a-github-for-data/
[4] http://en.wikipedia.org/wiki/Distributed_revision_control
[5] https://help.github.com/articles/post-receive-hooks
[6] http://www.niso.org/workrooms/resourcesync/
[7] http://www.w3.org/TR/prov-primer/

Re: [CODE4LIB] Corrections to Worldcat/Hathi/Google

Reply via email to