Re: Setting Similarity in IndexWriter and IndexSearcher

Grant Ingersoll Tue, 08 Jun 2004 14:46:59 -0700

I do these kind of things as part of a layer between Lucene and my application, but 
often have thought it would be nice to have a metadata layer available that wasn't 
part of the Lucene core, but was packaged w/ Lucene.  It could provide the information 
necessary and have tools for updating with out messing w/ the index.

For instance, one of the things I store is the name of the field that is the "true" 
document identifier (a unique String that won't change across Indexing) for that 
Document (which, as Doug pointed out, can vary from Document to Document w/in the 
Index).

Cheers,
Grant

>>> [EMAIL PROTECTED] 06/08/04 03:44PM >>>
David Spencer wrote:
> Does it ever make sense to set the Similartity obj in either (only one 
> of..) IndexWriter or IndexSearcher? i.e. If I set it in IndexWriter can 
> I avoid setting it in IndexSearcher? Also, can I avoid setting it in 
> IndexWriter and only set it in IndexSearcher? I noticed Nutch sets it in 
> both places and was wondering about what's going on behind the scenes...

No, it probably doesn't make sense to use a different Similarity 
implementation when indexing than when searching.  Ideally perhaps we'd 
have a LuceneConfiguration object, which encapsulates the Similarity, 
Analysis and Directory implementations, as well as perhaps other 
parameters.  And perhaps this could even be stored with the index, using 
Java object serialization.  However I worry that this could cause more 
confusion than it solves.  For example, one might not easily be able to 
search and index if a class used when it was indexed is no longer 
available when searching.  Tools like Luke could become more difficult 
to write and use.

By design, one does not have to declare things up-front with Lucene. 
For example, one never has to declare the set of fields and their types. 
  Different documents in the same index can use different fields, or 
even use the same field name differently.  Saving analyzers and 
similarity implementations with the index reduces this sort of 
flexibility somewhat.  If you rename your analysis or similarity class, 
does your index become invalid?  Lucene currently avoids such issues, at 
the expense of potential confusion about using different analyzers and 
similarity at index and search time.  But I don't think the latter is in 
practice a problem that needs more than a little documentation.

Sorry for the long-winded answer!

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Setting Similarity in IndexWriter and IndexSearcher

Reply via email to