Good evening, or rather morning I guess.

I have been working with Chris (#apache - arryeder) on setting up a test environment on httpd.zones.apache.org to allow us to use Solr & Lucene as the HTTPd docs search engine, with a view to possibly replacing the current google implementation.

We have got this working, using the following components:

Java JDK - 1.5 or higher
Nightly snaphsot of solr, currently using snapshot from Nov 2nd 2007
Perl 5.8.8
XML::Parser   (XML-Parser-2.34)
   CPAN -> XML::XPath
   CPAN -> File::Find
   CPAN -> Cwd
expat-2.0.1 (http://sourceforge.net/projects/expat/)
svn (Only the client is needed)

We now have an index of the 2.2.x documents, and these can be queried using fajita (the #apache bot). We dont have a web form ready yet. But if someone wants to help and contribute one, it will be gratefully received I can assure you :-)

< pctony> fajita:  newds  mod_rewrite
< fajita> [33] (1)http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html (2)http://httpd.apache.org/docs/2.2/misc/rewriteguide.html#ToC1 (3)http://httpd.apache.org/docs/2.2/howto/access.html#rewrite (4)http://httpd.apache.org/docs/2.2/rewrite/index.html (5)http://httpd.apache.org/docs/2.2/vhosts/mass.html#homepages.rewrite

This is an example query in IRC. [33] means there are 33 related results, but for the purposes of IRC we only return the top 5 results sorted by relevance. With a web form this wont be neccesary. I am currently in the process of documenting this 'docsearch' tool. I already have a partial runbook for infra@

The index is built from the latest svn checkout of the docs, so it can be maintained much more easily. All that is needed for the index to update is for the latest .xml files to be checked out or updated and the re-index to be run. That's all.

The next version will index all version of the docs, with multi-language support hot on it's heels.




Cheers,
Tony



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to