Many things would be cleaner in Lucene if fields had a global semantics, i.e., if properties like text vs. binary, Index, Store, TermVector, the appropriate Analyzer, the assignment of Directory in ParallelReader (or ParallelWriter), etc. were a function of just the field name and the index. This approach would naturally admit a class, say IndexFieldSet, that would hold global field semantics for an index.
Lucene today allows many field properties to vary at the Field level. E.g., the same field name might be tokenized in one Field on a Document while it is untokenized in another Field on the same or different Document. Does anybody know how often this flexibility is used? Are there interesting use cases for which it is important? It seems to me this functionality is already problematic and not fully supported; e.g., indexing can manage tokenization-variant fields, but query parsing cannot. Various extensions to Lucene exacerbate this kind of problem. Perhaps more controversially, the notion of global field semantics would be even stronger if the set of fields is closed. This would allow, for example, QueryParser to validate field names. This has a number of benefits, including for example avoiding false-negative "no results" due to misspelling a field name. Has this been considered before? Are there good reasons this path has not been followed? Thanks for any info, Chuck --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]