Thanks for sharing this bit of detective work. I noticed something similar fairly recently myself [1], but didn't discover as plausible of a scenario for what had happened as you did. I imagine others have noticed this network effect before as well.
On Tue, Aug 21, 2012 at 11:42 AM, Lars Aronsson <l...@aronsson.se> wrote: > And sure enough, there it is, > http://clio.cul.columbia.edu:7018/vwebv/holdingsInfo?bibId=1439352 > But will my error report to Worldcat find its way back > to CLIO? Or if I report the error to Columbia University, > will the correction propagate to Google, Hathi and Worldcat? > (Columbia asks me for a student ID when I want to give > feedback, so that removes this option for me.) I realize this probably will sound flippant (or overly grandiose), but innovating solutions to this problem, where there isn't necessarily one metadata master that everyone is slaved to seems to be one of the more important and interesting problems that our sector faces. When Columbia University can become the source of a bibliographic record for Google Books, HathiTrust and OpenLibrary, etc how does this change the hub and spoke workflows (with OCLC as the hub) that we are more familiar with? I think this topic is what's at the heart of the discussions about a "github-for-data" [2,3], since decentralized version control systems [4] allow for the evolution of more organic, push/pull, multimaster workflows...and platforms like Github make them socially feasible, easy and fun. I also think Linked Library Data, where bibliographic descriptions are REST enabled Web resources identified with URLs, and patterns such as webhooks [5] make it easy to trigger update events could be part of an answer. Feed technologies like Atom, RSS and the work being done on ResourceSync also seem important technologies for us to use to allow people to poll for changes [6]. And being able to say where you have obtained data from, possibly using something like the W3C Provenance vocabulary [7] also seems like an important part of the puzzle. I'm sure there are other (and perhaps better) creative analogies or tools that could help solve this problem. I think you're probably right that we are starting to see the errors more now that more library data is becoming part of the visible Web via projects like GoogleBooks, HathiTrust, OpenLibrary and other enterprising libraries that design their catalogs to be crawlable and indexable by search engines. But I think it's more fun to think about (and hack on) what grassroots things we could be doing to help these new bibliographic data workflows to grow and flourish than to get piled under by the errors, and a sense of futility... Or it might make for a good article or dissertation topic :-) //Ed [1] http://inkdroid.org/journal/2011/12/25/genealogy-of-a-typo/ [2] http://www.informationdiet.com/blog/read/we-need-a-github-for-data [3] http://sunlightlabs.com/blog/2010/we-dont-need-a-github-for-data/ [4] http://en.wikipedia.org/wiki/Distributed_revision_control [5] https://help.github.com/articles/post-receive-hooks [6] http://www.niso.org/workrooms/resourcesync/ [7] http://www.w3.org/TR/prov-primer/