thanks so much for your help Eric.
i was missing the attset statement in my zebra.cfg
> attset: bib1.att
which is also missing from the idzebra-2.0 installed example,
/usr/share/idzebra-2.0-examples/oai-pmh/conf/zebra.cfg, but
perhaps is not necessary for the DOM XML record model.
i agree with Ross that your outline would be very
useful to have on the code4lib twiki.
also, my apologies, the zebralist _is_ active, and Adam
Dickmeiss provided an equivalently helpful just a bit
later.
/[EMAIL PROTECTED]
Eric Lease Morgan wrote:
On Jun 17, 2008, at 12:06 PM, siznax wrote:
the Zebralist appears to be relatively inactive.
anyone here have experience indexing MARC binaries
with zebra?
I will try to outline here how to index (and search) MARC records using
Zebra, but tweaking the indexing process is a bit trickier than I know
how to do.
1. Install yaz, zebra, and all of their friends. I have found that the
"standard" make process works pretty well, but allow yaz and zebra to
specify where it puts various configuration files. The extra
specification is not worth the effort.
2. Save your MARC records someplace on your file system. By "binary"
MARC records, I suppose you mean "real" MARC records -- MARC records in
communications format -- MARC records as the types of records fed to
traditional integrated library systems. This is opposed to some flavor
of XML or "tagged format" often used for display.
3. Create a zebra.cfg file, and have it look something like this:
# global paths
profilePath: .:./etc:/usr/local/share/idzebra-2.0/tab
modulePath: /usr/local/lib/idzebra-2.0/modules
# turn ranking on
rank: rank-1
# define a database of marc records called opac
opac.database: opac
opac.recordtype: grs.marcxml.marc21
attset: bib1.att
attset: explain.att
4. Index your MARC records with the following command. You should see
lot's of great stuff sent to STDOUT.
zebraidx -g opac update <path to MARC records>
You have now created your index. Once you get this far with indexing,
you will want to tweak various .abs files (I think) to enhance the
indexing process. This particular thing is not my forte. It seems like
black magic to most of us. This is not a Zebra-specific problem; this is
a problem with Z39.50.
Next, you need to implement the client/server end of things:
5. Start your server. This will be a Z39.50 server -- a "kewl"
library-centric protocol that existed before the Internet got hot:
zebrasrv localhost:9999 &
6. Use yaz-client to search your index:
& yaz-client
Z> open localhost:9999/opac
Z> find origami
Z> show 1
Z> quit
Using the yaz-client almost requires a knowledge of Z39.50. Attached
should be a Perl script that allows you to search your server in a bit
more user-friendly way. To use it you will need to install a few Perl
modules and then edit the constant called DATABASE.
Even though Z39.50 is/was "kewl" it is still pretty icky. SRU is better
-- definitely a step in the right direction, and Zebra supports SRU out
of the box. [1]
7. Create an an SRU configuration file looking something like this:
<yazgfs>
<server>
<config>zebra.cfg</config>
<cql2rpn>pqf.properties</cql2rpn>
</server>
</yazgfs>
8. Acquire a "better" pqf.properties file. PQF is about querying
Z39.50 databases. It is ugly. It was designed in a non-Internet world.
Instead of knowing that 1=4 means search the title field, you want to
simply search the title. Attached is a "better" pqf.properties file, and
it is "better" because it maps things like 1=4 to Dublin Core
equivalents. Save it in a directory called etc in the same directory as
your zebra.cfg file. (Notice how the zebra.cfg file, above, denotes etc
as being in zebra's path.)
9. Kill your presently running Z39.50 server.
10. Start up a SRU server:
zebrasrv -f sru.cfg localhost:9999 &
11. Use your HTTP client to search the SRU server. Queries will look
like this (with carriage returns added for readability):
http://localhost:9999/opac?
operation=searchRetrieve&
version=1.1&
query=origami&
maximumRecords=5
The result should be a stream of XML ready for XSLT processing.
All of the above is almost exactly what I did to create an index of MARC
records harvested from the Library of Congress and the University of
Michigan's OAI data repository (MBooks). [2] Take a look at the HTML
source. Notice how the client in this regard is only one HTML file
containing a form, one CSS file for style, and one XSL file for XML to
HTML transformation.
HTH.
[1] SRU - http://www.loc.gov/standards/sru/
[2] Example SRU interface - http://infomotions.com/ii/
------------------------------------------------------------------------