Hi

I did the attached writeup of the fulltext interface for the wiki. 
Have testet it using a local MoinMoin installation but glitches may occur.

Have fun
  Søren
-- 
Søren Hilmer, M.Sc., M.Crypt.
wideTrail                       Phone:  +45 25481225
Pilevænget 41           Email:  [EMAIL PROTECTED]
DK-8961  Allingåbro     Web:    www.widetrail.dk
== Fulltext Indexing and Searching ==

CouchDB provides and interface for facilitating integration of fulltext 
searchengines. In addition CouchDB suplies 
a reference implementation of this interface using [http://lucene.apache.org 
Lucene]

=== Index interface ===

CouchDB uses stdio for interfacing to the search engine, whenever a document is 
changed the name of the database 
containing the document is sent to stdout.

CouchDB does not expect to receive anything on stdin (read it will crash if it 
does).

==== setup ====

The indexer is started by CouchDB using the commandline specified in the 
couch.ini configurationparameter:

{{{
DbUpdateNotificationProcess
}}}


=== Search interface ===

CouchDB again uses stdio to interface to the searcher part.

Currently this interface is not exposed through Futon, so to try it out you 
need to start CouchDB with the 
interactive option -i to get an Erlang shell.

From there you can write search queries like:

{{{
couch_ft_query:execute("database", "+ query +string").
}}}


For this example the string "database\n" followed by "+ query +string\n" is 
transmitted to stdout.

The result of the search is received through stdin and displayed in the shell,
currently the format of this is not specified and left to the
particular search engine.


==== setup ====

The searcher is started by CouchDB using the commandline specified in the 
couch.ini configurationparameter:

{{{
FullTextSearchQueryServer
}}}


=== Lucene reference implementation ===

==== Use of special design document ====

A database to index must contain a speciel design document in this format:

{{{
{
  "_id":"_design/fulltextsearch",
  "_rev":"123",
  "fulltext_options": {
    "views": {
      "names" : {"index":"view-value", "return":"document"},
      "cities": {"index":"view-key", "return":"view"}
    }
  }
}
}}}

The Lucene indexer uses the defined views in this document to guide the 
indexing process. 

In this example the views "names" and "cities" must also be defined in the 
database. 
Lucene will index the "view-value" for the "names" view and return documents as 
search results, 
for the "cities" view it will index the view-key and return the view in search 
results.

For info on views in CouchDB see: Self:Views


==== Dependencies ====

The Lucene indexer depends on these projects .jar files to work
 * couchdb4j.jar (see below)
   * commons-beanutils.jar
   * commons-codec-1.3.jar
   * commons-collections.jar
   * commons-httpclient-3.1.jar
   * commons-lang.jar
   * commons-logging-1.1.jar
   * ezmorph-1.0.3.jar
   * json-lib-2.0-jdk15.jar
 * lucene-core-2.3.1.jar

Note: all the couchdb4j dependencies (as you can see some have not
version info supplied) is probably easily checked out from the
couchdb4j repository (see below).

Note: at this time of writing couchdb4j needs to be patched using the patches 
specified in issue 6 and 8 
on the coucdb4j issue tracking list: 
http://code.google.com/p/couchdb4j/issues/list
So checkout trunk patch and build.

At least Java version 5 is needed.

==== Compiling ====

The Lucene indexer is not build as part of the CouchDB. 

You need to:
 * setup a Java developer environment (at least version 5). 
 * Checkout CouchDB source.
 * Change directory to src/fulltext/lucene
 * Compile using javac with CLASSPATH with the needed dependencies (listed 
above)
 * Do: jar cf !CouchLucene.jar *.class 

As result you should get a file !CouchLucene.jar to include in your CLASSPATH 
at runtime.

==== Runtime setup ====

You need a path to your java runtime (at least version 5).
You have to setup your java CLASSPATH to contain all the .jar files listed in 
the dependency list,
alternatively you can specify it on the commandline deifined for the .ini 
options like:

{{{
FullTextSearchQueryServer=java -cp /path/to/couchdb4j/lib/couchdb4j.jar:... 
LuceneSearcher
DbUpdateNotificationProcess=java -cp /path/to/couchdb4j/lib/couchdb4j.jar:... 
LuceneIndexer
}}}

Note above example works on Unix like OS's

Reply via email to