http://staff.oclc.org/~levan/PearsTraining/scifi.usmarc has 10,000 marc records in it. They are part of the old SiteSearch system that OCLC released as open source. They date back to 2002 and will not contain any Unicode, if you were hoping to include that as part of your testing.
Ralph -----Original Message----- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Alexander Johannesen Sent: Wednesday, January 11, 2012 5:36 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Open datasets Hiya, I'm in the middle of creating a meta data management system (including merging and persistent identifier management) for a somewhat different domain (intranets and business integration), but it's based on Topic Maps and so is well suited to other means of meta data handling / mangling. It's also going to be open-source, and it might be well-suited to library tasks as well. So in order to test the integrity and performance of my system so far I'm wondering if there's a suitable open dataset of bibliographic records that aren't too obscure (meaning, I can find the titles at amazon or Open Library) that you could recommend? More than 1000 records, but less than a million, maybe? Regards, Alex