On Dec 13, 2007, at 9:48 AM, Ed Summers wrote:
use Net::OAI::Harvester;
use MODSHandler;
my $url = 'http://memory.loc.gov/cgi-bin/oai2_0';
my $harvester = Net::OAI::Harvester->new(baseURL => $url);
my $records = $harvester->listRecords(
metadataPrefix => 'mods',
metadataHandler => 'MODSHandler'
);
while ($record = $records->next()) {
print $record->metadata()->title(), "\n";
}
...
Interestingly back in 2000 or whatever when this was written it felt
like pretty state of the art to use filters in this way. But today it
seems kind of overkill to have to write a state-machine just to get at
some XML. The ruby oai library [2] I worked on more recently kind of
bucks the trend of not trying to create fancy objects for records and
hand waving memory concerns (which never seemed to surface) and just
returns back what amounts to a DOM and lets the user figure out what
they want.
What type(s) of data are methods applied against the metadata method
(above) expected to return? Only scalars? How about objects? How
about other Perl data structures like a hash (of hashes)? Are there a
pre-defined set of methods that can be called against the metadata
method?
I suppose the afore mentioned MODSHandler can be designed to support
any number of methods returning different types of data. Correct?
For example, the code above is designed to return a title. Additional
methods might return authors, subjects, publishers, etc.
Spurned on by the availability of MBooks from the University of
Michigan [1], I have written the beginnings of a SAX filter for
MARCXML data. Currently it iterates over MARCXML, parses the data,
and prints to STDOUT something looking like a MARC tagged display.
Ironically, this was rather easy because MARCXML only has a limited
number of elements: leader, controlfield, datafield, and subfield.
Using Ed's code as a model, I think I could create a method called
MARC that returns a MARC::Record object, like this:
use Net::OAI::Harvester;
use MARCXML;
my $url = 'http://memory.loc.gov/cgi-bin/oai2_0';
my $harvester = Net::OAI::Harvester->new( baseURL => $url );
my $records = $harvester->listRecords(
metadataPrefix => 'marc21',
metadataHandler => 'MARCXML'
);
while ( $record = $records->next ) {
# call the MARC method returning a MARC::Record object
$marc = $record->metadata()->MARC, "\n";
# apply cool MARC::Record methods against the object
print $marc->title;
}
Alternatively, I suppose I could create methods like this:
$leader = $record->metadata()->leader;
$control = $record->metadata()->control;
$title = $record->metadata()->datafield( '245', 'a' );
$author = $record->metadata()->datafield( '100', 'a' );
$url = $record->metadata()->datafield( '856', 'u' );
Is this approach a good idea? On the other hand, maybe I should
return the whole record in all of its MARC glory. Which approach is
better? Maybe I should do both? Maybe I should return a DOM as Ed
alludes to. Ah, the choices!
[1] http://lists.webjunction.org/wjlists/xml4lib/2007-December/
005978.html
--
Eric Lease Morgan
University Libraries of Notre Dame