Sounds like the right approach!

Perhaps, both ways should be allowed - the analyzer from the index is used by default 
but can be overriden explicitly in the API (not sure about the query parser though). 
The easiest usage pattern should then be to specify the analyzer once and use it going 
forward (for adding documents and for querying). But for special needs one could 
specify a different analyzer, in which case the programmer takes the responsibility 
for keeping the index/query results consistant. 

Another question is whether to serialize the analyzer or provide a factory that 
instantiates one by name and store only the name. Serializing would make index stores 
more portable across Lucene installations, since the analyzer class does not need to 
be present. But instantiating by name would allow analyzers that have non-serializable 
dependencies (for example, an analyzer that calls native WordNet API to expand 
synonyms).

In our use, I don't see us moving index stores between Lucene installations that are 
configured with different classes in the class path. We do move them between similarly 
configured installations though.

What about class versioning? I can't think of clear advantages one way or another, but 
it seems that it would be an issue to consider. 



================================================

1 (complicated way): When the index store is created, register an
analyzer for each field (could be the same one.)  A serialized copy of
the analyzer is stored in the index base, and queries on that field
are automatically processed with it.

2 (simpler, less complete way): Have a way of telling the query parser
that "these fields use these analyzers", or at the very least, "these
fields don't get tokenized with an analyzer."


_______________________________________________
Lucene-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lucene-dev

Reply via email to