stas 02/03/21 22:40:23
Modified: src/search README
Log:
podify the README page and add the info about the sub-section search
Revision Changes Path
1.4 +86 -38 modperl-docs/src/search/README
Index: README
===================================================================
RCS file: /home/cvs/modperl-docs/src/search/README,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -r1.3 -r1.4
--- README 22 Mar 2002 02:02:15 -0000 1.3
+++ README 22 Mar 2002 06:40:23 -0000 1.4
@@ -1,18 +1,34 @@
+=head1 NAME
+
+perl.apache.org Site Indexing and Search Setup
+
+=head1 Description
+
This document explains how to setup swish-e, index and search the
perl.apache.org site.
-Setting up swish-e:
--------------------
+=head1 Setting up swish-e
+
+=over
+
+=item 1
+
+Install the dev version of swish-e. Currently we use SWISH-E 2.1-dev-25.
+
+=item 2
-- Install the dev version of swish-e. Currently we use SWISH-E 2.1-dev-25.
+Make sure that swish-e is in the PATH, so the apps will be able to
+find it
-- make sure that swish-e is in the PATH, so the apps will be able to
- find it
+=back
-Indexing:
----------
+=head1 Indexing
-1. Set an environment variable to the path of the site:
+=over
+
+=item 1
+
+Set an environment variable to the path of the site:
export MODPERL_SITE='http://perl.apache.org'
@@ -27,14 +43,17 @@
This is used as the base for spidering, plus is used to determine
the sections of the site (for limiting the site to those sections.
+=item 2
-2. normally build the site:
+Normally build the site:
% bin/build -f (-d to build pdfs)
which among other things creates the dir: dst_html/search
-3. Index the site
+=item 3
+
+Index the site
% cd dst_html/search
% swish-e -S prog -c swish.conf
@@ -67,48 +86,76 @@
Elapsed time: 00:00:20 CPU time: 00:00:02
Indexing done!
+=back
+
Now you can search...
-Searching:
-----------
+=head1 Searching
+
+=over
+
+=item 1
-1. Go to the search page: ..../search/search.html
+Go to the search page: ..../search/search.html
-2. Search
+=item 2
-If something doesn't work check the error_log file on the server the
-swish.cgi is running on. The most common error is that the swish-e
-binary cannot be found by the swish.cgi script. Remember that CGI may
-be running under a different username and therefore may not have the
-same PATH env variable.
+Search
+If something doesn't work check the I<error_log> file on the server
+the I<swish.cgi> is running on. The most common error is that the
+swish-e binary cannot be found by the I<swish.cgi> script. Remember
+that CGI may be running under a different username and therefore may
+not have the same PATH env variable.
-Swish-e related adjustments to the template:
---------------------------------------------
+=back
+
+=head1 Swish-e related adjustments to the templates
+
+=item *
+
+Since we want to index only the real content, we use:
-- since we want to index only the real content, we use:
<!-- Swishcommand index -->,
only content here will indexed
<!-- Swishcommand noindex -->,
+=item *
+
+Since we want to be able to search any sub-section of the site, the
+search form includes the hidden variable C<sbm> (mnemonics: 'search by
+meta'). For example:
+
+ <input type="checkbox" name="sbm" value="docs/1.0/guide" />
+
+will search all the documents under I<docs/1.0/guide> directory.
+the correct value for the C<sbm> variable are set in the template when
+the site is created.
-How does indexing work?
------------------------
+The main search page I</search/swish.cgi>, has multiply checkboxes for
+the for the C<sbm> variable searching only certain parts of the site.
-Swish is run with a config file, and is run in a mode that says
-to use an external program to fetch documents. That external program
-is called spider.pl (part of the swish-e distribution).
-
-spider.pl uses a config file (by default) of SwishSpiderConfig.pl. This file
-builds an array of hashes (in this case a sinlge hash in the array). This
hash
-is the config.
-
-Part of the config are call-back functions that spider.pl will call while
spidering.
-One says to skip image files. Another one is a bit more tricky. It splits
a document into
-sections, creates new "sub-pages" that are complete HTML pages, and calls
the function in spider.pl
-that sends those off to swish for indexing. (That function then returns
false to tell swish not to
-index that document since the sections have already been indexed.)
+=back
+
+
+=head1 How does indexing work
+
+Swish is run with a config file, and is run in a mode that says to use
+an external program to fetch documents. That external program is
+called I<spider.pl> (part of the swish-e distribution).
+
+I<spider.pl> uses a config file (by default) of
+I<SwishSpiderConfig.pl>. This file builds an array of hashes (in this
+case a sinlge hash in the array). This hash is the config.
+
+Part of the config are call-back functions that spider.pl will call
+while spidering. One says to skip image files. Another one is a bit
+more tricky. It splits a document into sections, creates new
+"sub-pages" that are complete HTML pages, and calls the function in
+spider.pl that sends those off to swish for indexing. (That function
+then returns false to tell swish not to index that document since the
+sections have already been indexed.)
That's about it.
@@ -116,5 +163,6 @@
./spider.pl > bigfile.out
-Another trick, you can send SIGHUP to spider.pl while indexing and
+Another trick, you can send SIGHUP to I<spider.pl> while indexing and
it will stop spidering, but let swish index what's been read so far.
+
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]