I haven't tried this out in detail, so I don't have any specific opinion.
But Bill has spent some time sorting out searching on apache.org, so we
may want to consider returning our local search engine.

Joshua.


---------- Forwarded message ----------
Date: Wed, 30 Apr 2003 14:45:32 -0700
From: Bill Moseley <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: search.apache.org

http://search.apache.org has been updated with the new index as was described 
here a week or
so ago.

The new index is a result of spidering instead of indexing the file system -- 
the index
files are about 11MB instead of 200MB.  Frees a little space on /x2, which is 
up at 95% full
again.

There may be things that don't need to be indexed (cvs update archives?), so 
let me know if
anything else should be excluded.  Or make use of robots.txt.

Site specific searches can be done by setting the "what" CGI parameter.  For 
example:

  http://search.apache.org/index.cgi?what=httpd&keyword=installation
or
  http://search.apache.org/index.cgi?what=docs2&keyword=installation

limit to just httpd.apache.org or 2.0 docs.

The "advanced" form is just:

  http://search.apache.org/index.cgi

which just allows searching over the entire site, search by field, and set sort 
order.

There's two other features that are not shown by default.  One is to select 
"fuzzy"
searching, and the other is to limit searches by a data range.  I'm not sure 
they
need to be enabled at all.

Those features can be tested by:

  http://search.apache.org/index.cgi?full=1



And to ramble a bit...

This is all in Perl CGI, which is slow.  Plus, the highlighting code is set in 
the most
aggressive mode, and that's where most of the time is spent.  It's brut-force 
highlighting.

The CGI script runs the swish-e binary for searches, but only if there are not 
more than 4
swish-e binaries running as found by grepping the output from /bin/ps -Unobody 
-ocommand.
Still, hitting the CGI script hard will load the server, no doubt.

Running under mod_perl would help, especially with highlighting turned off or 
down, and
using the swish-e C library (via SWISH::API module) instead of the swish-e 
binary.

Here's some general request/second on an Athlon XP 1800+ with 1/2GB RAM, Linux 
2.4.20
and Apache/1.3.26 mod_perl/1.26 using ab.

                             Requests per Second

                              Highlighting Mode
                      Off      Phrase    Default     Simple
   Using SWISH::API   45        1.5        2          12
   ----------------------------------------------------------------------------
   Using swish-e      12        1.3       1.8         7.5
     binary

As you can see the highlighting code is the limiting factor.  I have 
search.apache.org setup
for the swish-e binary and "Phrase" highlighting.  The worst combination. ;)


-- 
Bill Moseley
[EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to