Re: [ol-discuss] Is there an up-to-date MARC database of lendable ebooks?

Tom Morris Tue, 17 Feb 2015 20:04:09 -0800

On Tue, Feb 17, 2015 at 8:48 PM, Jon Leech <oddh...@sonic.net> wrote:


>     I'm trying to obtain a list of lendable ebooks from Open Library (in
> order to dispose of corresponding parts of my personal library). My
> first try landed me at the "Lending library MARC records" section of
>     https://openlibrary.org/data#bulk_download
>
> Following the link there to
>     https://archive.org/details/marc_lendable_books
> shows that the MARC records download was last updated in June 2011, and
> cursory inspection of what's in there suggests that it may not include
> even some lendable ebooks that were added to OL prior to June 2011.
>
>     Is there a current source of MARC data corresponding to lendable
> ebooks in OL, or a straightforward way to derive one from other
> resources? I'd prefer not to have to learn how to write a web app simply
> in order to get at the catalog. Using a local database with the Perl
> MARC::Record package is my preference, if it's possible. If not, I'd
> appreciate suggestions on how to access this data with the least amount
> of pain. I'm a complete neophyte at the OL ecosystem, alas.


As far as I know, the MARC dump was a one-time thing.  For a task like
this, no web app programming should be required since the best source of
info is probably full OpenLibrary dump.  The nice thing about working off
the dump is that it allows you to both be independent of the vagaries of a
MARC dump update schedule and tailor the meaning of "lendable ebook" to
something which suits your constituency (English only? Languages commonly
spoken in North America only? OCR gibberish excluded? etc etc)

I don't do much Perl, but the dump is basically just a big hunk of JSON
that you can filter in the manner that you which and then convert to MARC.

I did a project in Python a while ago which was focused on classics in the
English language available as eBooks from Internet Archive.  The client,
Santa Clara County Library, made all the code available as open source and
you can find it here:  https://github.com/tfmorris/openlibrary-utils  One
of the things we attempted to do as part of this project was to get a
handle on OCR quality (becomes much of it on IA, frankly, sucks) as well as
find the best quality edition/scan of the many editions/scans on IA.  A
single work may exist in multiple editions and even a single edition often,
perhaps surprisingly, has multiple scans from different sources with widely
varying scan qualities, subsequent OCR qualities, etc.

Ideally an OL eBook MARC extractor would be something that could be set up
as a tool chain that anyone could run with their favorite parameters for
languages, subjects, quality thresholds, etc.  I suspect there's enough
interest there, but because Internet Archive actively ignores OpenLibrary,
there's really no place for a community to form to explore common
interests, etc.

Good luck with your project!  Let me know if I can help.

Tom

_______________________________________________
Ol-discuss mailing list - Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
Archives: http://www.mail-archive.com/ol-discuss@archive.org/
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org

Re: [ol-discuss] Is there an up-to-date MARC database of lendable ebooks?

Reply via email to