[OPEN-ILS-DEV] Holdings Import Program

2007-11-14 Thread Travis Schafer
Following a conversation with Jason yesterday (yeah, I'm trying to use him as 
an excuse for barging in on the list... sorry Jason), I thought that I would 
post the attached program here.
 
It's a small C program that I whipped up to import holdings information to 
Evergreen from a MARC XML file. I am sure that it could use some improvement 
(the optimization was for programmer time), but it has worked well and is 
thoroughly documented with comments. Obviously, it needs to be compiled against 
libxml2 and the Postgres library.
 
Anyway, I hope that it's useful for someone else. If it isn't, it isn't... and 
I'm always open to suggestions.  
 
Thanks for your time, and sorry for the interruption!
 
Travis Schafer
Technology Director
Carson City - Crystal Area Schools
//
// create_holdings.c
//
// Carson City - Crystal Area Schools
// Technology Department
// 115 E Main
// Carson City, MI 48811
// (989) 584-3138
//
// Given an XML MARC file, this program creates holdings information
// for records whose bibliographic data is already present in the Evergreen
// database (based on TCN).
//
// This program DOES NOT pull the holding libary from the the MARC XML
// file. Instead, this option is set on the command line by specifying the
// actor.org_unit.id value for the owning and circulating libraries. This
// is because the program was originally developed for K-12 migrations from
// systems that were islands, and in many cases didn't have a holdings
// tag that indicated the owning or circulating library. Serveral other
// items of information that would be pulled from MARC during conversion
// from a sane system are simalarly defined as command line options. Again,
// the incumbant K-12 systems aren't sane.
//
// However, it would be trivial to modify this program to extract additional
// information from a holdings tag. Note also that several functions are
// declared and defined, but not used... they are present for debugging.
//
// This program is poorly written. It probably leaks memory... it actually 
causes
// permanent damage to RAM. It's been known to cause server farms to burst
// into flames. It stole money from my sock drawer. You've been warned...
//
// There are two tables involved in this enterprise. The relavent fields
// (the ones we will be filling out) are as follows (FK indicates a foreign
// key):
//
// asset.call_number
// creator   : FK - User who created this entry (actor.usr.id)
// editor: FK - User who last edited this record (actor.usr.id)
// record: FK - Biblio Data for copy (biblio.record_entry.id)
// owning_lib: FK - Owning library (actor.org_unit.id)
// label : The call number!
// 
//
// asset.copy
// circ_lib  :  FK - Circulating Library (actor.org_unit.id)
// creator   :  FK - User who created this entry (actor.usr.id)
// call_number   :  FK - Item Call Number (asset.call_number.id)
// editor:  FK - User who last edited this record (actor.usr.id)
// status:  FK - Item Status (config.copy_status.id)
// location  :  FK - Location (ie, Stacks) of copy (asset.copy_location.id)
// loan_duration :  Required, but not an FK... '2' is popular
// fine_level:  Required, but not an FK... '2' is popular
// price :  Item Price
// barcode   :  Not suprisingly, the item barcode
//
// So, basically, we extract the Barcode, Price, Call Number, and TCN of
// each record in the MARC File. Then, we use the TCN to  find the value for
// asset.call_number.record, create a call number record, then create a copy
// record.
//
// We get editor, owning_lib, circ_lib, creator, status, location, etc from
// the command line...
//
// Or at least that's the plan
//
//
// Copyright 2007 Travis Schafer
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, version 3
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program.  If not, see http://www.gnu.org/licenses/.
//
// Initial Program:
// 2007-11-07 T.Schafer
//
// Changes:
//
// 2007-11-13 T.Schafer - 1) Initialized variable in wrong spot. This meant
//   that if any copy was found to have an existing
//   barcode, no subsequent entires would be inserted
//   to the database (basically, we didn't reset and
//   exists flag at the top of a loop)
//
// 2007-11-14 T.Schafer   2) Inserted GPL/Copyright Notice


#include stdlib.h
#include stdio.h
#include string.h
#include ctype.h

#include libpq-fe.h
#include libxml/parser.h
#include libxml/tree.h

#define _GNU_SOURCE
#include 

Re: [OPEN-ILS-DEV] Holdings Import Program

2007-11-14 Thread Dan Scott
On 14/11/2007, Travis Schafer [EMAIL PROTECTED] wrote:


 Following a conversation with Jason yesterday (yeah, I'm trying to use him
 as an excuse for barging in on the list... sorry Jason), I thought that I
 would post the attached program here.

 It's a small C program that I whipped up to import holdings information to
 Evergreen from a MARC XML file. I am sure that it could use some improvement
 (the optimization was for programmer time), but it has worked well and is
 thoroughly documented with comments. Obviously, it needs to be compiled
 against libxml2 and the Postgres library.

 Anyway, I hope that it's useful for someone else. If it isn't, it isn't...
 and I'm always open to suggestions.

 Thanks for your time, and sorry for the interruption!

 Travis Schafer
 Technology Director
 Carson City - Crystal Area Schools


Travis:

You (and any others lurking about) are more than welcome on the list!
Feel free to barge in any time, and please stick around...

This looks really nice, actually - it's always good to have examples
of well-documented code that make the import process more explicit! As
xmlReadFile() reads the whole XML document into memory, I suppose it
would make sense for large libraries interested in using this approach
to chunk the blocks of records up into reasonable sizes (50K records
per file or so). import_holdings.pl, which I reworked a bit in trunk,
suffers from the same affliction, but what are you gonna do?

The one challenge I've noticed with the two-step approach of importing
the biblio records, then importing the holdings for those records, is
that synchronizing the TCN between the two steps can be a bit of a
pain. Our system, for example, quite happily allows duplicate 001
fields (arggh!). I've been considering moving the basic logic from
import_holdings.pl into marc2bre.pl so that we can ensure that the
TCNs are perfectly synchronized for the bib records and the
corresponding holdings. It will mean an additional set of command-line
flags, but hey - it's not like you have to migrate legacy records
every day.

As far as your code goes, if you want to contribute it to the code
repository there's one more step to take - as this is a substantial
contribution, you need to attach a copy of the Developer's Certificate
of Origin (DCO) 1.1 as mentioned in
http://open-ils.org/documentation/contributing.html

If you have any other utilities that you whip up that you think might
be useful to the project, keep 'em coming!

-- 
Dan Scott
Laurentian University


Re: [OPEN-ILS-DEV] Holdings Import Program

2007-11-14 Thread Travis Schafer
Dan,
 
If you think it's worth submitting, I'll be more than happy to attach the 
DcO...waddya think?
 
--TS

 Dan Scott [EMAIL PROTECTED] 11/14/2007 10:10 AM 
On 14/11/2007, Travis Schafer [EMAIL PROTECTED] wrote:


 Following a conversation with Jason yesterday (yeah, I'm trying to use him
 as an excuse for barging in on the list... sorry Jason), I thought that I
 would post the attached program here.

 It's a small C program that I whipped up to import holdings information to
 Evergreen from a MARC XML file. I am sure that it could use some improvement
 (the optimization was for programmer time), but it has worked well and is
 thoroughly documented with comments. Obviously, it needs to be compiled
 against libxml2 and the Postgres library.

 Anyway, I hope that it's useful for someone else. If it isn't, it isn't...
 and I'm always open to suggestions.

 Thanks for your time, and sorry for the interruption!

 Travis Schafer
 Technology Director
 Carson City - Crystal Area Schools


Travis:

You (and any others lurking about) are more than welcome on the list!
Feel free to barge in any time, and please stick around...

This looks really nice, actually - it's always good to have examples
of well-documented code that make the import process more explicit! As
xmlReadFile() reads the whole XML document into memory, I suppose it
would make sense for large libraries interested in using this approach
to chunk the blocks of records up into reasonable sizes (50K records
per file or so). import_holdings.pl, which I reworked a bit in trunk,
suffers from the same affliction, but what are you gonna do?

The one challenge I've noticed with the two-step approach of importing
the biblio records, then importing the holdings for those records, is
that synchronizing the TCN between the two steps can be a bit of a
pain. Our system, for example, quite happily allows duplicate 001
fields (arggh!). I've been considering moving the basic logic from
import_holdings.pl into marc2bre.pl so that we can ensure that the
TCNs are perfectly synchronized for the bib records and the
corresponding holdings. It will mean an additional set of command-line
flags, but hey - it's not like you have to migrate legacy records
every day.

As far as your code goes, if you want to contribute it to the code
repository there's one more step to take - as this is a substantial
contribution, you need to attach a copy of the Developer's Certificate
of Origin (DCO) 1.1 as mentioned in
http://open-ils.org/documentation/contributing.html 

If you have any other utilities that you whip up that you think might
be useful to the project, keep 'em coming!

-- 
Dan Scott
Laurentian University


Re: [OPEN-ILS-DEV] Holdings Import Program

2007-11-14 Thread Bill Erickson

Dan Scott wrote:

On 14/11/2007, Travis Schafer [EMAIL PROTECTED] wrote:
  


Thanks for the code, Travis!

[snip]


This looks really nice, actually - it's always good to have examples
of well-documented code that make the import process more explicit! As
xmlReadFile() reads the whole XML document into memory, I suppose it
would make sense for large libraries interested in using this approach
to chunk the blocks of records up into reasonable sizes (50K records
per file or so). import_holdings.pl, which I reworked a bit in trunk,
suffers from the same affliction, but what are you gonna do?
  
A SAX version might be in order... Libxml2 and Expat both provide fast 
and relatively easy to use SAX API's.  An Expat example can be found at 
http://svn.open-ils.org/trac/ILS/browser/trunk/Open-ILS/src/apachemods/mod_xmlent.c 
(search for XMLCALL and parser)


Just a thought..



-bill

--
Bill Erickson
| VP, Software Development  Integration
| Equinox Software, Inc. / The Evergreen Experts
| phone: 877-OPEN-ILS (673-6457)
| email: [EMAIL PROTECTED]
| web: http://esilibrary.com