stas 02/03/21 22:40:23 Modified: src/search README Log: podify the README page and add the info about the sub-section search Revision Changes Path 1.4 +86 -38 modperl-docs/src/search/README Index: README =================================================================== RCS file: /home/cvs/modperl-docs/src/search/README,v retrieving revision 1.3 retrieving revision 1.4 diff -u -r1.3 -r1.4 --- README 22 Mar 2002 02:02:15 -0000 1.3 +++ README 22 Mar 2002 06:40:23 -0000 1.4 @@ -1,18 +1,34 @@ +=head1 NAME + +perl.apache.org Site Indexing and Search Setup + +=head1 Description + This document explains how to setup swish-e, index and search the perl.apache.org site. -Setting up swish-e: -------------------- +=head1 Setting up swish-e + +=over + +=item 1 + +Install the dev version of swish-e. Currently we use SWISH-E 2.1-dev-25. + +=item 2 -- Install the dev version of swish-e. Currently we use SWISH-E 2.1-dev-25. +Make sure that swish-e is in the PATH, so the apps will be able to +find it -- make sure that swish-e is in the PATH, so the apps will be able to - find it +=back -Indexing: ---------- +=head1 Indexing -1. Set an environment variable to the path of the site: +=over + +=item 1 + +Set an environment variable to the path of the site: export MODPERL_SITE='http://perl.apache.org' @@ -27,14 +43,17 @@ This is used as the base for spidering, plus is used to determine the sections of the site (for limiting the site to those sections. +=item 2 -2. normally build the site: +Normally build the site: % bin/build -f (-d to build pdfs) which among other things creates the dir: dst_html/search -3. Index the site +=item 3 + +Index the site % cd dst_html/search % swish-e -S prog -c swish.conf @@ -67,48 +86,76 @@ Elapsed time: 00:00:20 CPU time: 00:00:02 Indexing done! +=back + Now you can search... -Searching: ----------- +=head1 Searching + +=over + +=item 1 -1. Go to the search page: ..../search/search.html +Go to the search page: ..../search/search.html -2. Search +=item 2 -If something doesn't work check the error_log file on the server the -swish.cgi is running on. The most common error is that the swish-e -binary cannot be found by the swish.cgi script. Remember that CGI may -be running under a different username and therefore may not have the -same PATH env variable. +Search +If something doesn't work check the I<error_log> file on the server +the I<swish.cgi> is running on. The most common error is that the +swish-e binary cannot be found by the I<swish.cgi> script. Remember +that CGI may be running under a different username and therefore may +not have the same PATH env variable. -Swish-e related adjustments to the template: --------------------------------------------- +=back + +=head1 Swish-e related adjustments to the templates + +=item * + +Since we want to index only the real content, we use: -- since we want to index only the real content, we use: <!-- Swishcommand index -->, only content here will indexed <!-- Swishcommand noindex -->, +=item * + +Since we want to be able to search any sub-section of the site, the +search form includes the hidden variable C<sbm> (mnemonics: 'search by +meta'). For example: + + <input type="checkbox" name="sbm" value="docs/1.0/guide" /> + +will search all the documents under I<docs/1.0/guide> directory. +the correct value for the C<sbm> variable are set in the template when +the site is created. -How does indexing work? ------------------------ +The main search page I</search/swish.cgi>, has multiply checkboxes for +the for the C<sbm> variable searching only certain parts of the site. -Swish is run with a config file, and is run in a mode that says -to use an external program to fetch documents. That external program -is called spider.pl (part of the swish-e distribution). - -spider.pl uses a config file (by default) of SwishSpiderConfig.pl. This file -builds an array of hashes (in this case a sinlge hash in the array). This hash -is the config. - -Part of the config are call-back functions that spider.pl will call while spidering. -One says to skip image files. Another one is a bit more tricky. It splits a document into -sections, creates new "sub-pages" that are complete HTML pages, and calls the function in spider.pl -that sends those off to swish for indexing. (That function then returns false to tell swish not to -index that document since the sections have already been indexed.) +=back + + +=head1 How does indexing work + +Swish is run with a config file, and is run in a mode that says to use +an external program to fetch documents. That external program is +called I<spider.pl> (part of the swish-e distribution). + +I<spider.pl> uses a config file (by default) of +I<SwishSpiderConfig.pl>. This file builds an array of hashes (in this +case a sinlge hash in the array). This hash is the config. + +Part of the config are call-back functions that spider.pl will call +while spidering. One says to skip image files. Another one is a bit +more tricky. It splits a document into sections, creates new +"sub-pages" that are complete HTML pages, and calls the function in +spider.pl that sends those off to swish for indexing. (That function +then returns false to tell swish not to index that document since the +sections have already been indexed.) That's about it. @@ -116,5 +163,6 @@ ./spider.pl > bigfile.out -Another trick, you can send SIGHUP to spider.pl while indexing and +Another trick, you can send SIGHUP to I<spider.pl> while indexing and it will stop spidering, but let swish index what's been read so far. +
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]