Thank you for the replies, and after a bit of investigation I learned that I 
don’t need to do authentication because the vendor does IP authentication. 
Nice! On the other hand, I was still not able to resolve my original problem. 

I needed/wanted to download ten’s of thousands, if not hundred’s of thousands 
of citations for text mining analysis. The Web interface to the database/index 
limits output to 4,000 items and selecting the set of these items is beyond 
tedious — it is cruel and unusual punishment. I then got the idea of using 
EndNote’s z39.50 client, and after a bit of back & forth I got it working, but 
the downloading process was too slow. I then got the bright idea of writing my 
own z39.50 client (below). Unfortunately, I learned that the 4,000 record limit 
is more than that. A person can only download the first 4,000 records in a 
found set. Requests for record 4001, 4002, etc. fail. This is true in my 
locally written client as well as in EndNote.

Alas, it looks as if I am unable to download the data I need/require, unless 
somebody at the vendor give me a data dump. On the other hand, since my locally 
written client is so short and simple, I think I can create a Web-based 
interface to query many different z39.50 targets and provide on-the-fly text 
mining analysis against the results.

In short, I learned a great many things.

—
Eric Lease Morgan
University of Notre Dame


#!/usr/bin/perl

# nytimes-search.pl - rudimentary z39.50 client to query the NY Times

# Eric Lease Morgan <[email protected]>
# November 13, 2013 - first cut; "Happy Birthday, Steve!"

# usage: ./nytimes-search.pl > nytimes.marc


# configure
use constant DB     => 'hnpnewyorktimes';
use constant HOST   => 'fedsearch.proquest.com';
use constant PORT   => 210;
use constant QUERY  => '@attr 1=1016 "trade or tariff"';
use constant SYNTAX => 'usmarc';

# require
use strict;
use ZOOM;

# do the work
eval {

        # connect; configure; search
        my $conn = new ZOOM::Connection( HOST, PORT, databaseName => DB );
        $conn->option( preferredRecordSyntax => SYNTAX );
        my $rs = $conn->search_pqf( QUERY );

        # requests > 4000 return errors
        # print $rs->record( 4001 )->raw;
                        
        # retrieve; will break at record 4,000 because of vendor limitations
        for my $i ( 0 .. $rs->size ) {
        
                print STDERR "\tRetrieving record #$i\r";
                print $rs->record( $i )->raw;
                
        }
                
};

# report errors
if ( $@ ) { print STDERR "Error ", $@->code, ": ", $@->message, "\n" }

# done
exit;

Reply via email to