I do these kind of things as part of a layer between Lucene and my application, but often have thought it would be nice to have a metadata layer available that wasn't part of the Lucene core, but was packaged w/ Lucene. It could provide the information necessary and have tools for updating with out messing w/ the index.
For instance, one of the things I store is the name of the field that is the "true" document identifier (a unique String that won't change across Indexing) for that Document (which, as Doug pointed out, can vary from Document to Document w/in the Index). Cheers, Grant >>> [EMAIL PROTECTED] 06/08/04 03:44PM >>> David Spencer wrote: > Does it ever make sense to set the Similartity obj in either (only one > of..) IndexWriter or IndexSearcher? i.e. If I set it in IndexWriter can > I avoid setting it in IndexSearcher? Also, can I avoid setting it in > IndexWriter and only set it in IndexSearcher? I noticed Nutch sets it in > both places and was wondering about what's going on behind the scenes... No, it probably doesn't make sense to use a different Similarity implementation when indexing than when searching. Ideally perhaps we'd have a LuceneConfiguration object, which encapsulates the Similarity, Analysis and Directory implementations, as well as perhaps other parameters. And perhaps this could even be stored with the index, using Java object serialization. However I worry that this could cause more confusion than it solves. For example, one might not easily be able to search and index if a class used when it was indexed is no longer available when searching. Tools like Luke could become more difficult to write and use. By design, one does not have to declare things up-front with Lucene. For example, one never has to declare the set of fields and their types. Different documents in the same index can use different fields, or even use the same field name differently. Saving analyzers and similarity implementations with the index reduces this sort of flexibility somewhat. If you rename your analysis or similarity class, does your index become invalid? Lucene currently avoids such issues, at the expense of potential confusion about using different analyzers and similarity at index and search time. But I don't think the latter is in practice a problem that needs more than a little documentation. Sorry for the long-winded answer! Doug --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
