I haven't tried this out in detail, so I don't have any specific opinion. But Bill has spent some time sorting out searching on apache.org, so we may want to consider returning our local search engine.
Joshua. ---------- Forwarded message ---------- Date: Wed, 30 Apr 2003 14:45:32 -0700 From: Bill Moseley <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: search.apache.org http://search.apache.org has been updated with the new index as was described here a week or so ago. The new index is a result of spidering instead of indexing the file system -- the index files are about 11MB instead of 200MB. Frees a little space on /x2, which is up at 95% full again. There may be things that don't need to be indexed (cvs update archives?), so let me know if anything else should be excluded. Or make use of robots.txt. Site specific searches can be done by setting the "what" CGI parameter. For example: http://search.apache.org/index.cgi?what=httpd&keyword=installation or http://search.apache.org/index.cgi?what=docs2&keyword=installation limit to just httpd.apache.org or 2.0 docs. The "advanced" form is just: http://search.apache.org/index.cgi which just allows searching over the entire site, search by field, and set sort order. There's two other features that are not shown by default. One is to select "fuzzy" searching, and the other is to limit searches by a data range. I'm not sure they need to be enabled at all. Those features can be tested by: http://search.apache.org/index.cgi?full=1 And to ramble a bit... This is all in Perl CGI, which is slow. Plus, the highlighting code is set in the most aggressive mode, and that's where most of the time is spent. It's brut-force highlighting. The CGI script runs the swish-e binary for searches, but only if there are not more than 4 swish-e binaries running as found by grepping the output from /bin/ps -Unobody -ocommand. Still, hitting the CGI script hard will load the server, no doubt. Running under mod_perl would help, especially with highlighting turned off or down, and using the swish-e C library (via SWISH::API module) instead of the swish-e binary. Here's some general request/second on an Athlon XP 1800+ with 1/2GB RAM, Linux 2.4.20 and Apache/1.3.26 mod_perl/1.26 using ab. Requests per Second Highlighting Mode Off Phrase Default Simple Using SWISH::API 45 1.5 2 12 ---------------------------------------------------------------------------- Using swish-e 12 1.3 1.8 7.5 binary As you can see the highlighting code is the limiting factor. I have search.apache.org setup for the swish-e binary and "Phrase" highlighting. The worst combination. ;) -- Bill Moseley [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
