: > Are there good reasons this path has not been followed? : : Hoss, that's your cue.
I must admit, I haven't been able to fully follow this thread, perhaps it's just because it's late (no, that can't be it ... i started reading it at 3:30 this afternoon and then stoped because it was making my head hurt). In honestly, I probably would skimmed the whole thing without commenting if Marvin hadn't called me out onto the mat -- so I'll do my best to make sense of it. As near as i can tell, the large issue can be sumarized with the following sentiment: Performance gains could be realized if Field properties were made fixed and homogeneous for all Documents in an index. ...I've left this sentiment vague, and i'll ignore the implimentation specifics since i don't understand them -- but there seems to be two high level approaches that are involved, which are advocated to varying degrees by varying folks... 1) all Fields and their properties must be predeclared before any document is ever added to the index, and any Field not declared is illegal. 2) a Field springs into existence the first time a Document is added with a value for it -- but after that all newly added Documents with a value for that field must conform to the Field properites initially used. (have I missed any general approaches?) The questions (in my mind at least) are: a) How much performance gain can be realized by these limitations? b) Would it be possible to impliment these limitiations in such a way that they are "optional" for people willing to accept the trade off? c) if (b) is false, then is (a) great enough to warrant changing Lucene anyway? What exactly is sacrificed? I can't speak to (a) or (b) ... but I'll throw out some examples for (c) Regarding #1... If Fields must be predeclared, Lucene would lose two of the biggest advantages it has in my opinion: * The ability to evolve an index. To have an extremely large index, and to add a field to this index that is only used by "new" documents. This is not only usefull when the nature of you data changes (TPS Reports didn't use to have a "cover_sheet" field, and now they do) but also when the usage of an existing field changes and you don't want to rebuild from scratch (you've allways had an index "cover_sheet" field, and now you want it to be stored to .. so you change your index building code, and let it run for a little while, and then go back and reindex the old stuff later) * the ability to have dynamicly named fields. At CNET we have "attibutes" for products, those attributes are defined in a database, and the list of valid attributes is differnet based on the type of product. I don't know what they all are, and that list could change tomorow -- and i don't want to have to rebuild my index from scratch just because someone decided that laptops need a new attribute called "heat disopation factor" (note: Regarding #2... This approach wouldn't neccessarily conflict with the dynamicly named fields example above, but it would suffer the same "evolving index" problems. Last but not least is the high level issue of "homogeneous" Fields and Field properties for all documents. As has been pointed out, in many cases this is not that big of a deal, because even if you want heterogenous documents stored in a single index, you can construct a list of Fields which is the union of the Fields from your heterogenous Documents and use it -- hopefully no new requirement is added that all Documents must have a value for all fields. But what about complex iteractions between multi-values, stored, indexed fields? How would something like this work? docA.add(new Field(f, "bar", Store.YES, Index.UN_TOKENIZED)): docA.add(new Field(f, "foo", Store.NO, Index.TOKENIZED)): docB.add(new Field(f, "x y", Store.YES, Index.TOKENIZED)): docB.add(new Field(f, "z", Store.NO, Index.UN_TOKENIZED)): ...both docs have two "FIelds" for field name "f", both have a stored value for f, both have some indexed terms for f, both have some tokenized terms and one utokenized term for f ... but do these two docs both conform to the same "Global field semantics" ? -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]