[OPEN-ILS-DEV] Holdings Import Program
Following a conversation with Jason yesterday (yeah, I'm trying to use him as an excuse for barging in on the list... sorry Jason), I thought that I would post the attached program here. It's a small C program that I whipped up to import holdings information to Evergreen from a MARC XML file. I am sure that it could use some improvement (the optimization was for programmer time), but it has worked well and is thoroughly documented with comments. Obviously, it needs to be compiled against libxml2 and the Postgres library. Anyway, I hope that it's useful for someone else. If it isn't, it isn't... and I'm always open to suggestions. Thanks for your time, and sorry for the interruption! Travis Schafer Technology Director Carson City - Crystal Area Schools // // create_holdings.c // // Carson City - Crystal Area Schools // Technology Department // 115 E Main // Carson City, MI 48811 // (989) 584-3138 // // Given an XML MARC file, this program creates holdings information // for records whose bibliographic data is already present in the Evergreen // database (based on TCN). // // This program DOES NOT pull the holding libary from the the MARC XML // file. Instead, this option is set on the command line by specifying the // actor.org_unit.id value for the owning and circulating libraries. This // is because the program was originally developed for K-12 migrations from // systems that were islands, and in many cases didn't have a holdings // tag that indicated the owning or circulating library. Serveral other // items of information that would be pulled from MARC during conversion // from a sane system are simalarly defined as command line options. Again, // the incumbant K-12 systems aren't sane. // // However, it would be trivial to modify this program to extract additional // information from a holdings tag. Note also that several functions are // declared and defined, but not used... they are present for debugging. // // This program is poorly written. It probably leaks memory... it actually causes // permanent damage to RAM. It's been known to cause server farms to burst // into flames. It stole money from my sock drawer. You've been warned... // // There are two tables involved in this enterprise. The relavent fields // (the ones we will be filling out) are as follows (FK indicates a foreign // key): // // asset.call_number // creator : FK - User who created this entry (actor.usr.id) // editor: FK - User who last edited this record (actor.usr.id) // record: FK - Biblio Data for copy (biblio.record_entry.id) // owning_lib: FK - Owning library (actor.org_unit.id) // label : The call number! // // // asset.copy // circ_lib : FK - Circulating Library (actor.org_unit.id) // creator : FK - User who created this entry (actor.usr.id) // call_number : FK - Item Call Number (asset.call_number.id) // editor: FK - User who last edited this record (actor.usr.id) // status: FK - Item Status (config.copy_status.id) // location : FK - Location (ie, Stacks) of copy (asset.copy_location.id) // loan_duration : Required, but not an FK... '2' is popular // fine_level: Required, but not an FK... '2' is popular // price : Item Price // barcode : Not suprisingly, the item barcode // // So, basically, we extract the Barcode, Price, Call Number, and TCN of // each record in the MARC File. Then, we use the TCN to find the value for // asset.call_number.record, create a call number record, then create a copy // record. // // We get editor, owning_lib, circ_lib, creator, status, location, etc from // the command line... // // Or at least that's the plan // // // Copyright 2007 Travis Schafer // This program is free software: you can redistribute it and/or modify // it under the terms of the GNU General Public License as published by // the Free Software Foundation, version 3 // // This program is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU General Public License for more details. // // You should have received a copy of the GNU General Public License // along with this program. If not, see http://www.gnu.org/licenses/. // // Initial Program: // 2007-11-07 T.Schafer // // Changes: // // 2007-11-13 T.Schafer - 1) Initialized variable in wrong spot. This meant // that if any copy was found to have an existing // barcode, no subsequent entires would be inserted // to the database (basically, we didn't reset and // exists flag at the top of a loop) // // 2007-11-14 T.Schafer 2) Inserted GPL/Copyright Notice #include stdlib.h #include stdio.h #include string.h #include ctype.h #include libpq-fe.h #include libxml/parser.h #include libxml/tree.h #define _GNU_SOURCE #include
Re: [OPEN-ILS-DEV] Holdings Import Program
On 14/11/2007, Travis Schafer [EMAIL PROTECTED] wrote: Following a conversation with Jason yesterday (yeah, I'm trying to use him as an excuse for barging in on the list... sorry Jason), I thought that I would post the attached program here. It's a small C program that I whipped up to import holdings information to Evergreen from a MARC XML file. I am sure that it could use some improvement (the optimization was for programmer time), but it has worked well and is thoroughly documented with comments. Obviously, it needs to be compiled against libxml2 and the Postgres library. Anyway, I hope that it's useful for someone else. If it isn't, it isn't... and I'm always open to suggestions. Thanks for your time, and sorry for the interruption! Travis Schafer Technology Director Carson City - Crystal Area Schools Travis: You (and any others lurking about) are more than welcome on the list! Feel free to barge in any time, and please stick around... This looks really nice, actually - it's always good to have examples of well-documented code that make the import process more explicit! As xmlReadFile() reads the whole XML document into memory, I suppose it would make sense for large libraries interested in using this approach to chunk the blocks of records up into reasonable sizes (50K records per file or so). import_holdings.pl, which I reworked a bit in trunk, suffers from the same affliction, but what are you gonna do? The one challenge I've noticed with the two-step approach of importing the biblio records, then importing the holdings for those records, is that synchronizing the TCN between the two steps can be a bit of a pain. Our system, for example, quite happily allows duplicate 001 fields (arggh!). I've been considering moving the basic logic from import_holdings.pl into marc2bre.pl so that we can ensure that the TCNs are perfectly synchronized for the bib records and the corresponding holdings. It will mean an additional set of command-line flags, but hey - it's not like you have to migrate legacy records every day. As far as your code goes, if you want to contribute it to the code repository there's one more step to take - as this is a substantial contribution, you need to attach a copy of the Developer's Certificate of Origin (DCO) 1.1 as mentioned in http://open-ils.org/documentation/contributing.html If you have any other utilities that you whip up that you think might be useful to the project, keep 'em coming! -- Dan Scott Laurentian University
Re: [OPEN-ILS-DEV] Holdings Import Program
Dan, If you think it's worth submitting, I'll be more than happy to attach the DcO...waddya think? --TS Dan Scott [EMAIL PROTECTED] 11/14/2007 10:10 AM On 14/11/2007, Travis Schafer [EMAIL PROTECTED] wrote: Following a conversation with Jason yesterday (yeah, I'm trying to use him as an excuse for barging in on the list... sorry Jason), I thought that I would post the attached program here. It's a small C program that I whipped up to import holdings information to Evergreen from a MARC XML file. I am sure that it could use some improvement (the optimization was for programmer time), but it has worked well and is thoroughly documented with comments. Obviously, it needs to be compiled against libxml2 and the Postgres library. Anyway, I hope that it's useful for someone else. If it isn't, it isn't... and I'm always open to suggestions. Thanks for your time, and sorry for the interruption! Travis Schafer Technology Director Carson City - Crystal Area Schools Travis: You (and any others lurking about) are more than welcome on the list! Feel free to barge in any time, and please stick around... This looks really nice, actually - it's always good to have examples of well-documented code that make the import process more explicit! As xmlReadFile() reads the whole XML document into memory, I suppose it would make sense for large libraries interested in using this approach to chunk the blocks of records up into reasonable sizes (50K records per file or so). import_holdings.pl, which I reworked a bit in trunk, suffers from the same affliction, but what are you gonna do? The one challenge I've noticed with the two-step approach of importing the biblio records, then importing the holdings for those records, is that synchronizing the TCN between the two steps can be a bit of a pain. Our system, for example, quite happily allows duplicate 001 fields (arggh!). I've been considering moving the basic logic from import_holdings.pl into marc2bre.pl so that we can ensure that the TCNs are perfectly synchronized for the bib records and the corresponding holdings. It will mean an additional set of command-line flags, but hey - it's not like you have to migrate legacy records every day. As far as your code goes, if you want to contribute it to the code repository there's one more step to take - as this is a substantial contribution, you need to attach a copy of the Developer's Certificate of Origin (DCO) 1.1 as mentioned in http://open-ils.org/documentation/contributing.html If you have any other utilities that you whip up that you think might be useful to the project, keep 'em coming! -- Dan Scott Laurentian University
Re: [OPEN-ILS-DEV] Holdings Import Program
Dan Scott wrote: On 14/11/2007, Travis Schafer [EMAIL PROTECTED] wrote: Thanks for the code, Travis! [snip] This looks really nice, actually - it's always good to have examples of well-documented code that make the import process more explicit! As xmlReadFile() reads the whole XML document into memory, I suppose it would make sense for large libraries interested in using this approach to chunk the blocks of records up into reasonable sizes (50K records per file or so). import_holdings.pl, which I reworked a bit in trunk, suffers from the same affliction, but what are you gonna do? A SAX version might be in order... Libxml2 and Expat both provide fast and relatively easy to use SAX API's. An Expat example can be found at http://svn.open-ils.org/trac/ILS/browser/trunk/Open-ILS/src/apachemods/mod_xmlent.c (search for XMLCALL and parser) Just a thought.. -bill -- Bill Erickson | VP, Software Development Integration | Equinox Software, Inc. / The Evergreen Experts | phone: 877-OPEN-ILS (673-6457) | email: [EMAIL PROTECTED] | web: http://esilibrary.com