Re: [CODE4LIB] zebraidx errors

siznax Wed, 18 Jun 2008 10:29:24 -0700

thanks so much for your help Eric.

i was missing the attset statement in my zebra.cfg


> attset: bib1.att

which is also missing from the idzebra-2.0 installed example,
/usr/share/idzebra-2.0-examples/oai-pmh/conf/zebra.cfg, but
perhaps is not necessary for the DOM XML record model.

i agree with Ross that your outline would be very
useful to have on the code4lib twiki.

also, my apologies, the zebralist _is_ active, and Adam
Dickmeiss provided an equivalently helpful just a bit
later.


/[EMAIL PROTECTED]



Eric Lease Morgan wrote:

On Jun 17, 2008, at 12:06 PM, siznax wrote:
the Zebralist appears to be relatively inactive.
anyone here have experience indexing MARC binaries
with zebra?
I will try to outline here how to index (and search) MARC records usingZebra, but tweaking the indexing process is a bit trickier than I knowhow to do.
1. Install yaz, zebra, and all of their friends. I have found that the"standard" make process works pretty well, but allow yaz and zebra tospecify where it puts various configuration files. The extraspecification is not worth the effort.
2. Save your MARC records someplace on your file system. By "binary"MARC records, I suppose you mean "real" MARC records -- MARC records incommunications format -- MARC records as the types of records fed totraditional integrated library systems. This is opposed to some flavorof XML or "tagged format" often used for display.
  3. Create a zebra.cfg file, and have it look something like this:

      # global paths
      profilePath: .:./etc:/usr/local/share/idzebra-2.0/tab
      modulePath: /usr/local/lib/idzebra-2.0/modules

      # turn ranking on
      rank: rank-1

      # define a database of marc records called opac
      opac.database: opac
      opac.recordtype: grs.marcxml.marc21
      attset: bib1.att
      attset: explain.att
4. Index your MARC records with the following command. You should seelot's of great stuff sent to STDOUT.
      zebraidx -g opac update <path to MARC records>
You have now created your index. Once you get this far with indexing,you will want to tweak various .abs files (I think) to enhance theindexing process. This particular thing is not my forte. It seems likeblack magic to most of us. This is not a Zebra-specific problem; this isa problem with Z39.50.
Next, you need to implement the client/server end of things:
5. Start your server. This will be a Z39.50 server -- a "kewl"library-centric protocol that existed before the Internet got hot:
      zebrasrv localhost:9999 &

  6. Use yaz-client to search your index:

      & yaz-client
      Z> open localhost:9999/opac
      Z> find origami
      Z> show 1
      Z> quit
Using the yaz-client almost requires a knowledge of Z39.50. Attachedshould be a Perl script that allows you to search your server in a bitmore user-friendly way. To use it you will need to install a few Perlmodules and then edit the constant called DATABASE.
Even though Z39.50 is/was "kewl" it is still pretty icky. SRU is better-- definitely a step in the right direction, and Zebra supports SRU outof the box. [1]
  7. Create an an SRU configuration file looking something like this:

     <yazgfs>
       <server>
         <config>zebra.cfg</config>
         <cql2rpn>pqf.properties</cql2rpn>
       </server>
     </yazgfs>
8. Acquire a "better" pqf.properties file. PQF is about queryingZ39.50 databases. It is ugly. It was designed in a non-Internet world.Instead of knowing that 1=4 means search the title field, you want tosimply search the title. Attached is a "better" pqf.properties file, andit is "better" because it maps things like 1=4 to Dublin Coreequivalents. Save it in a directory called etc in the same directory asyour zebra.cfg file. (Notice how the zebra.cfg file, above, denotes etcas being in zebra's path.)
  9. Kill your presently running Z39.50 server.

 10. Start up a SRU server:

      zebrasrv -f sru.cfg localhost:9999 &
11. Use your HTTP client to search the SRU server. Queries will looklike this (with carriage returns added for readability):
      http://localhost:9999/opac?
       operation=searchRetrieve&
       version=1.1&
       query=origami&
       maximumRecords=5

The result should be a stream of XML ready for XSLT processing.
All of the above is almost exactly what I did to create an index of MARCrecords harvested from the Library of Congress and the University ofMichigan's OAI data repository (MBooks). [2] Take a look at the HTMLsource. Notice how the client in this regard is only one HTML filecontaining a form, one CSS file for style, and one XSL file for XML toHTML transformation.
HTH.

[1] SRU - http://www.loc.gov/standards/sru/
[2] Example SRU interface - http://infomotions.com/ii/


------------------------------------------------------------------------

Re: [CODE4LIB] zebraidx errors

Reply via email to