Update on Solr & Lucene search for HTTPd docs

Tony Stevenson Fri, 02 Nov 2007 19:55:11 -0800

Good evening, or rather morning I guess.

I have been working with Chris (#apache - arryeder) on setting up a testenvironment on httpd.zones.apache.org to allow us to use Solr & Luceneas the HTTPd docs search engine, with a view to possibly replacing thecurrent google implementation.


We have got this working, using the following components:

Java JDK - 1.5 or higher
Nightly snaphsot of solr, currently using snapshot from Nov 2nd 2007
Perl 5.8.8
XML::Parser   (XML-Parser-2.34)
   CPAN -> XML::XPath
   CPAN -> File::Find
   CPAN -> Cwd
expat-2.0.1 (http://sourceforge.net/projects/expat/)
svn (Only the client is needed)

We now have an index of the 2.2.x documents, and these can be queriedusing fajita (the #apache bot).We dont have a web form ready yet. But if someone wants to help andcontribute one, it will be gratefully received I can assure you :-)


< pctony> fajita:  newds  mod_rewrite

< fajita> [33] (1)http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html(2)http://httpd.apache.org/docs/2.2/misc/rewriteguide.html#ToC1(3)http://httpd.apache.org/docs/2.2/howto/access.html#rewrite(4)http://httpd.apache.org/docs/2.2/rewrite/index.html(5)http://httpd.apache.org/docs/2.2/vhosts/mass.html#homepages.rewrite

This is an example query in IRC. [33] means there are 33 relatedresults, but for the purposes of IRC we only return the top 5 resultssorted by relevance. With a web form this wont be neccesary.I am currently in the process of documenting this 'docsearch' tool. Ialready have a partial runbook for infra@

The index is built from the latest svn checkout of the docs, so it canbe maintained much more easily. All that is needed for the index toupdate is for the latest .xml files to be checked out or updated and there-index to be run. That's all.

The next version will index all version of the docs, with multi-languagesupport hot on it's heels.





Cheers,
Tony



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Update on Solr & Lucene search for HTTPd docs

Reply via email to