search README

stas 22 Mar 2002 06:40:24 -0000

stas        02/03/21 22:40:23

  Modified:    src/search README
  Log:
  podify the README page and add the info about the sub-section search
  
  Revision  Changes    Path
  1.4       +86 -38    modperl-docs/src/search/README
  
  Index: README
  ===================================================================
  RCS file: /home/cvs/modperl-docs/src/search/README,v
  retrieving revision 1.3
  retrieving revision 1.4
  diff -u -r1.3 -r1.4
  --- README    22 Mar 2002 02:02:15 -0000      1.3
  +++ README    22 Mar 2002 06:40:23 -0000      1.4
  @@ -1,18 +1,34 @@
  +=head1 NAME
  +
  +perl.apache.org Site Indexing and Search Setup
  +
  +=head1 Description
  +
   This document explains how to setup swish-e, index and search the
   perl.apache.org site.
   
  -Setting up swish-e:
  --------------------
  +=head1 Setting up swish-e
  +
  +=over 
  +
  +=item 1
  +
  +Install the dev version of swish-e.  Currently we use SWISH-E 2.1-dev-25.
  +
  +=item 2
   
  -- Install the dev version of swish-e.  Currently we use SWISH-E 2.1-dev-25.
  +Make sure that swish-e is in the PATH, so the apps will be able to
  +find it
   
  -- make sure that swish-e is in the PATH, so the apps will be able to
  -   find it
  +=back
   
  -Indexing:
  ----------
  +=head1 Indexing
   
  -1. Set an environment variable to the path of the site:
  +=over 
  +
  +=item 1
  +
  +Set an environment variable to the path of the site:
   
       export MODPERL_SITE='http://perl.apache.org'
   
  @@ -27,14 +43,17 @@
   This is used as the base for spidering, plus is used to determine
   the sections of the site (for limiting the site to those sections.
       
  +=item 2 
   
  -2. normally build the site:
  +Normally build the site:
   
     % bin/build -f (-d to build pdfs)
   
   which among other things creates the dir: dst_html/search
   
  -3. Index the site
  +=item 3
  +
  +Index the site
   
     % cd dst_html/search
     % swish-e -S prog -c swish.conf
  @@ -67,48 +86,76 @@
     Elapsed time: 00:00:20 CPU time: 00:00:02
     Indexing done!
   
  +=back
  +
   Now you can search...
   
  -Searching:
  -----------
  +=head1 Searching
  +
  +=over 
  +
  +=item 1
   
  -1. Go to the search page: ..../search/search.html
  +Go to the search page: ..../search/search.html
   
  -2. Search
  +=item 2
   
  -If something doesn't work check the error_log file on the server the
  -swish.cgi is running on. The most common error is that the swish-e
  -binary cannot be found by the swish.cgi script. Remember that CGI may
  -be running under a different username and therefore may not have the
  -same PATH env variable.
  +Search
   
  +If something doesn't work check the I<error_log> file on the server
  +the I<swish.cgi> is running on. The most common error is that the
  +swish-e binary cannot be found by the I<swish.cgi> script. Remember
  +that CGI may be running under a different username and therefore may
  +not have the same PATH env variable.
   
  -Swish-e related adjustments to the template:
  ---------------------------------------------
  +=back
  +
  +=head1 Swish-e related adjustments to the templates
  +
  +=item *
  +
  +Since we want to index only the real content, we use:
   
  -- since we want to index only the real content, we use:
     <!-- Swishcommand index -->,
          only content here will indexed
     <!-- Swishcommand noindex -->,
   
  +=item *
  +
  +Since we want to be able to search any sub-section of the site, the
  +search form includes the hidden variable C<sbm> (mnemonics: 'search by
  +meta'). For example:
  +
  +  <input type="checkbox" name="sbm" value="docs/1.0/guide" />
  +
  +will search all the documents under I<docs/1.0/guide> directory.
   
  +the correct value for the C<sbm> variable are set in the template when
  +the site is created. 
   
  -How does indexing work?
  ------------------------
  +The main search page I</search/swish.cgi>, has multiply checkboxes for
  +the for the C<sbm> variable searching only certain parts of the site.
   
  -Swish is run with a config file, and is run in a mode that says
  -to use an external program to fetch documents.  That external program
  -is called spider.pl (part of the swish-e distribution).
  -
  -spider.pl uses a config file (by default) of SwishSpiderConfig.pl.  This file
  -builds an array of hashes (in this case a sinlge hash in the array).  This 
hash
  -is the config.
  -
  -Part of the config are call-back functions that spider.pl will call while 
spidering.
  -One says to skip image files.  Another one is a bit more tricky.  It splits 
a document into
  -sections, creates new "sub-pages" that are complete HTML pages, and calls 
the function in spider.pl
  -that sends those off to swish for indexing.  (That function then returns 
false to tell swish not to
  -index that document since the sections have already been indexed.)
  +=back
  +
  +
  +=head1 How does indexing work
  +
  +Swish is run with a config file, and is run in a mode that says to use
  +an external program to fetch documents.  That external program is
  +called I<spider.pl> (part of the swish-e distribution).
  +
  +I<spider.pl> uses a config file (by default) of
  +I<SwishSpiderConfig.pl>.  This file builds an array of hashes (in this
  +case a sinlge hash in the array).  This hash is the config.
  +
  +Part of the config are call-back functions that spider.pl will call
  +while spidering.  One says to skip image files.  Another one is a bit
  +more tricky.  It splits a document into sections, creates new
  +"sub-pages" that are complete HTML pages, and calls the function in
  +spider.pl that sends those off to swish for indexing.  (That function
  +then returns false to tell swish not to index that document since the
  +sections have already been indexed.)
   
   That's about it.
   
  @@ -116,5 +163,6 @@
   
      ./spider.pl > bigfile.out
   
  -Another trick, you can send SIGHUP to spider.pl while indexing and
  +Another trick, you can send SIGHUP to I<spider.pl> while indexing and
   it will stop spidering, but let swish index what's been read so far.
  +


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

cvs commit: modperl-docs/src/search README

Reply via email to