Hello, This discussion is very interesting. I would like to make a summary, so that we can go further.
1. A database of all books ever published is one of the thing still missing. 2. This needs massive collaboration by thousands of volunteers, so a wiki might be appropriate, however... 3. The data needs a structured web site, not a plain wiki like Mediawiki. 4. A big part of this data is already available, but scattered on various databases, in various languages, with various protocols, etc. So a big part of work needs as much database management knowledge as librarian knowledge. 5. What most missing in these existing databases (IMO) is information about translations: nowhere there are a general database of translated works, at least not in English and French. It is very difficult to find if a translation exists for a given work. Wikisource has some of this information with interwiki links between work and author pages, but for a (very) small number of works and authors. 6. It would be best not to duplicate work on several places. Personally I don't find OL very practical. May be I am too much used too Mediawiki. ;oD We still need to create something, attractive to contributors and readers alike. Yann Samuel Klein wrote: >> This thread started out with a discussion of why it is so hard to >> start new projects within the Wikimedia Foundation. My stance is >> that projects like OpenStreetMap.org and OpenLibrary.org are doing >> fine as they are, and there is no need to duplicate their effort >> within the WMF. The example you gave was this: > > I agree that there's no point in duplicating existing functionality. > The best solution is probably for OL to include this explicitly in > their scope and add the necessary functionality. I suggested this on > the OL mailing list in March. > http://mail.archive.org/pipermail/ol-discuss/2009-March/000391.html > >>>>>>> *A wiki for book metadata, with an entry for every published >>>>>>> work, statistics about its use and siblings, and discussion >>>>>>> about its usefulness as a citation (a collaboration with >>>>>>> OpenLibrary, merging WikiCite ideas) >> To me, that sounds exactly as what OpenLibrary already does (or >> could be doing in the near time), so why even set up a new project >> that would collaborate with it? Later you added: > > However, this is not what OL or its wiki do now. And OL is not run by > its community, the community helps support the work of a centrally > directed group. So there is only so much I feel I can contribute to > the project by making suggestions. The wiki built into the fiber of > OL is intentionally not used for general discussion. > >> I was talking about the metadata for all books ever published, >> including the Swedish translations of Mark Twain's works, which >> are part of Mark Twain's bibliography, of the translator's >> bibliography, of American literature, and of Swedish language >> literature. In OpenLibrary all of these are contained in one >> project. In Wikisource, they are split in one section for English >> and another section for Swedish. That division makes sense for >> the contents of the book, but not for the book metadata. > > This is a problem that Wikisource needs to address, regardless of > where the OpenLibrary metadata goes. It is similar to the Wiktionary > problem of wanting some content - the array of translations of a > single definition - to exist in one place and be transcluded in each > language. > >> Now you write: >> >>> However, the project I have in mind for OCR cleaning and >>> translation needs to >> That is a change of subject. That sounds just like what Wikisource >> (or PGDP.net) is about. OCR cleaning is one thing, but it is an >> entirely different thing to set up "a wiki for book metadata, with >> an entry for every published work". So which of these two project >> ideas are we talking about? > > They are closely related. > > There needs to be a global authority file for works -- a [set of] > universal identifier[s] for a given work in order for wikisource (as > it currently stands) to link the German translation of the English > transcription of OCR of the 1998 photos of the 1572 Rotterdam Codex... > to its metadata entry [or entries]. > > I would prefer for this authority file to be wiki-like, as the > Wikipedia authority file is, so that it supports renames, merges, and > splits with version history and minimal overhead; hence I wish to see > a wiki for this sort of metadata. > > Currently OL does not quite provide this authority file, but it could. > I do not know how easily. > >> Every book ever published means more than 10 million records. >> (It probably means more than 100 million records.) OCR cleaning >> attracts hundreds or a few thousand volunteers, which is >> sufficient to take on thousands of books, but not millions. > > Focusing efforts on notable works with verifiable OCR, and using the > sorts of helper tools that Greg's paper describes, I do not doubt that > we could effectively clean and publish OCR for all primary sources > that are actively used and referenced in scholarship today (and more > besides). Though 'we' here is the world - certainly more than a few > thousand volunteers have at least one book they would like to polish. > Most of them are not currently Wikimedia contributors, that much is > certain -- we don't provide any tools to make this work convenient or > rewarding. > >> Google scanned millions of books already, but I haven't heard of >> any plans for cleaning all that OCR text. > > Well, Google does not believe in distributed human effort. (This came > up in a recent Knol thread as well.) I'm not sure that is the best > comparison. > > SJ -- http://www.non-violence.org/ | Site collaboratif sur la non-violence http://www.forget-me.net/ | Alternatives sur le Net http://fr.wikisource.org/ | Bibliothèque libre http://wikilivres.info | Documents libres _______________________________________________ foundation-l mailing list [email protected] Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
