Chris Hostetter wrote on 07/10/2006 02:06 AM: > As near as i can tell, the large issue can be sumarized with the following > sentiment: > > Performance gains could be realized if Field > properties were made fixed and homogeneous for > all Documents in an index. >
This is certainly a large issue, as David says he has achieved a 5x performance gain. My interest in global field semantics originally sprang from functionality considerations, not performance considerations. I've got many features that require reasoning about field semantics. I previously mentioned a very simple one: validating fields in the query parser. More interesting examples are: 1. Multiple inheritance on the fields of documents that record the sources of each inherited value to support efficient incremental maintenance 2. "Record-valued fields" that store facets with values (e.g., time and user information for who set that value). These cannot easily be broken into multiple fields because the fields in question are multi-valued. 3. "Join fields" that reference id's of objects stored in separate indices (supporting queries that reference the fields in the joined index) Managing these kinds of rich semantic features in query parsing and indexing is greatly facilitated by a global field model. I've built this into my app, and then started thinking about benefits in Lucene generally from such a model. > 1) all Fields and their properties must be predeclared before any > document is ever added to the index, and any Field not declared is > illegal. > 2) a Field springs into existence the first time a Document is added > with a value for it -- but after that all newly added Documents with > a value for that field must conform to the Field properites initially > used. > > (have I missed any general approaches?) > Yes. Here is (an elaboration of) the "global model with exceptions" idea we reached: 3) There is a global field model in Lucene that contains the list of all known fields and their "default semantics". The class that contains this model supports a number of implicit and explicit methods to construct and query the model. The model can be evolved. The model is used many places in Lucene, in some cases according to application-settable properties. E.g.: a) Creating a Field uses the properties of the model so they need not be specified at each construction. A global model property determines whether or not field properties may be overridden, and whether or not fields may be created that are not in the model (in which case, they are automatically added to the model). b) The query parser has hooks that affect Query generation based on the model properties of the field (not just for certain special query types like Term's and RangeQuery's). The application can easily provide methods to implement these hooks. This is essential for features like 2&3 above (and beneficial for 1). > How would something like this work? > > docA.add(new Field(f, "bar", Store.YES, Index.UN_TOKENIZED)): > docA.add(new Field(f, "foo", Store.NO, Index.TOKENIZED)): > > docB.add(new Field(f, "x y", Store.YES, Index.TOKENIZED)): > docB.add(new Field(f, "z", Store.NO, Index.UN_TOKENIZED)): > The application could determine whether or not this kind of operation was supported accorded to the global enforcement properties of the model. If this is needed, the ability to have exceptions at the Field level would permit it. Hoss, do you have a use case requiring Store and Index variance like this? The impact of this flexibility on David's 5x is another question... Chuck --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]