http://staff.oclc.org/~levan/PearsTraining/scifi.usmarc has 10,000 marc
records in it.  They are part of the old SiteSearch system that OCLC
released as open source.  They date back to 2002 and will not contain
any Unicode, if you were hoping to include that as part of your testing.

Ralph

-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Alexander Johannesen
Sent: Wednesday, January 11, 2012 5:36 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Open datasets

Hiya,

I'm in the middle of creating a meta data management system (including
merging and persistent identifier management) for a somewhat different
domain (intranets and business integration), but it's based on Topic
Maps
and so is well suited to other means of meta data handling / mangling.
It's
also going to be open-source, and it might be well-suited to library
tasks
as well.

So in order to test the integrity and performance of my system so far
I'm
wondering if there's a suitable open dataset of bibliographic records
that
aren't too obscure (meaning, I can find the titles at amazon or Open
Library) that you could recommend? More than 1000 records, but less than
a
million, maybe?

Regards,

Alex

Reply via email to