Interesting question.

First, on the Library of Congress data, Internet Archive has a
snapshot of the LoC information from 2007.  It was collected by the
Scriblio project
http://www.archive.org/details/marc_records_scriblio_net.  There's
also some other record collections at archive that contain MARC
records.  There's some good MARC libraries in Perl.

As you point out though, looking at library catalogs is going to
produce a lot of holes.  You might have better luck looking at some of
the larger publishers.  They might have ONIX files they can share with
you, but the data harvesting with publishers typically isn't  easy to
do in an automatic way.  The publishers generally don't seem to make
that data available, which is a pity.  But I suspect contacting them
asking for ONIX dumps of their catalogs might be one of the quicker
routes, particularly for historical information.

One nice advantage with Perl is most of the books will have an isbn
number, which will help with combining data from multiple sources.

Another old-school, non-automated way technique to do this would be to
follow citation trails.  Use something like Web of Science.  Of
course, the issue there is that many of the citation sources will be
academic and there will be holes for publishers like Sams that are
more focused on developers.  ACM Digital Library also does this to a
degree if I remember correctly and they have non-ACM materials w/
record info.  For example, the first hit is Perl Cookbook when I
search there.

Depending on the scope of the project or how urgent it is this might
be a useful thing to crowd-source.  Start gathering the data and make
it available and ask people to send information about anything that is
missing.

One final question, do you want all books, published anywhere and each
edition?  So you want to know about, say, the Chinese translations to
Effective Perl Programming and some small book only published in
Sanskrit?

Jon Gorman


On Sun, Nov 6, 2011 at 1:18 PM, brian d foy <brian.d....@gmail.com> wrote:
> I'm looking for a way to discover all the books ever published about
> Perl. Where should I look?
>
> * Is there a Perl interface for the WorldCat APIs? If not, I'll make
> one. Are people merely shoving their results into something like
> XML::Feed? I have a big dump of data
>
> * WorldCat has many of the books, but there are holes. I realize that
> this is a union catalog instead of a historical database.
>
> * I know about the Amazon interfaces too, but I think that's the same
> problem as WorldCat (and there are already Perl interfaces for that).
>
> * I have the data dump from Google Books already.
>
> * I figure that the Library of Congress knows about a lot of them, but
> I don't have $20,000 to buy their 2012 database (or subsequent ones).
> Is there some other way to get re
>
> --
> brian d foy <brian.d....@gmail.com>
>

Reply via email to