When I've tried to do this, it's been much harder than your story, I'm
My library data is very inconsistent in the way it expresses it's
holdings. Even _without_ "missing" items, the holdings are expressed in
human-readable narrative form which is very difficult to parse reliably.
Theoretically, the holdings are expressed according to, I forget the
name of the Z. standard, but some standard for expressing human readable
holdings with certain punctuation and such. Even if they really WERE all
exactly according to this standard, this standard is not very easy to
parse consistently and reliably. But in fact, since when these tags are
entered nothing validates them to this standard -- and at different
times in history the cataloging staff entering them in various libraries
had various ideas about how strictly they should follow this local
"policy" -- our holdings are not even reliably according to that standard.
But if you think it's easy, please, give it a try and get back to us. :)
Maybe your library's data is cleaner than mine.
I think it's kind of a crime that our ILS (and many other ILSs) doesn't
provide a way for holdings to be efficiency entered (or guessed from
prediction patterns etc) AND converted to an internal structured format
that actually contains the semantic info we want. Offering catalogers
the option to manually enter an MFHD is not a solution.
Kyle Banerjee wrote:
The trick here is that traditional library metadata practices make it
hard_ to tell if a _specific volume/issue_ is held by a given library.
those are the most common use cases for OpenURL.
Yep. That's true even for individual library's with link resolvers. OCLC is
not going to be able to solve that particular issue until the local
This might not be as bad as people think. The normal argument is that
holdings are in free text and there's no way staff will ever have enough
time to record volume level holdings. However, significant chunks of the
problem can be addressed using relatively simple methods.
For example, if you can identify complete runs, you know that a library has
all holdings and can start automating things.
With this in mind, the first step is to identify incomplete holdings. The
mere presence of lingo like "missing," "lost," "incomplete," "scattered,"
"wanting," etc. is a dead giveaway. So are bracketed fields that contain
enumeration or temporal data (though you'll get false hits using this method
when catalogers supply enumeration). Commas in any field that contains
enumeration or temporal data also indicate incomplete holdings.
I suspect that the mere presence of a note is a great indicator that
holdings are incomplete since what kind of yutz writes a note saying "all
the holdings are here just like you'd expect?" Having said that, I need to
crawl through a lot more data before being comfortable with that statement.
Regexp matches can be used to search for closed date ranges in open serials
or close dates within 866 that don't correspond to close dates within fixed
That's the first pass. The second pass would be to search for the most
common patterns that occur within incomplete holdings. Wash, rinse, repeat.
After awhile, you'll get to all the cornball schemes that don't lend
themselves towards automation, but hopefully that group of materials is
getting to a more manageable size where throwing labor at the metadata makes
some sense. Possibly guessing if a volume is available based on timeframe is
a good way to go.
Worst case scenario if the program can't handle it is you deflect the
request to the next institution, and that already happens all the time for a
variety of reasons.
While my comments are mostly concerned with journal holdings, similar logic
can be used with monographic series as well.