Re: Setting Similarity in IndexWriter and IndexSearcher

Doug Cutting Tue, 08 Jun 2004 14:33:17 -0700

David Spencer wrote:

Does it ever make sense to set the Similartity obj in either (only one of..) IndexWriter or IndexSearcher? i.e. If I set it in IndexWriter can I avoid setting it in IndexSearcher? Also, can I avoid setting it in IndexWriter and only set it in IndexSearcher? I noticed Nutch sets it in both places and was wondering about what's going on behind the scenes...

No, it probably doesn't make sense to use a different Similarity implementation when indexing than when searching. Ideally perhaps we'd have a LuceneConfiguration object, which encapsulates the Similarity, Analysis and Directory implementations, as well as perhaps other parameters. And perhaps this could even be stored with the index, using Java object serialization. However I worry that this could cause more confusion than it solves. For example, one might not easily be able to search and index if a class used when it was indexed is no longer available when searching. Tools like Luke could become more difficult to write and use.

By design, one does not have to declare things up-front with Lucene. For example, one never has to declare the set of fields and their types. Different documents in the same index can use different fields, or even use the same field name differently. Saving analyzers and similarity implementations with the index reduces this sort of flexibility somewhat. If you rename your analysis or similarity class, does your index become invalid? Lucene currently avoids such issues, at the expense of potential confusion about using different analyzers and similarity at index and search time. But I don't think the latter is in practice a problem that needs more than a little documentation.

Sorry for the long-winded answer!

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Setting Similarity in IndexWriter and IndexSearcher

Reply via email to