On Jun 6, 2006, at 11:37 AM, Jan Prill wrote: > this statement tempted me to jump in, even without using something > like dynamic field creation myself __right now__. But I have been - > especially on cms like projects badly in need for dynamic fields. > > That something isn't common in sql doesn't mean that there is no > need for this "something". This limitation of sql is the reason for > doing things like storing xml in relational dbs as well as the > reason for people using object dbs. I don't know if you had a look > at dabble db, but imagine something like this with a relational > dbms. not funny! Because of this they haven't even thought about > using sql for dabble db. So maybe it's just me but the argument: > you can't do this in sql either doesn't sound too convincing...
Jan, I don't understand the requirement, and I'm not familiar with the either dabble db or Rails, so neither that example nor the "models" example Dave cited earlier has spoken to me. I asked the question because I honestly wanted to see a concrete example of an application that couldn't be handled within the constraint of pre- defined fields. Behind the scenes in Lucene is an elaborate, expensive apparatus for dealing with dynamic fields. Each document gets turned into its own miniature inverted index, complete with its own FieldInfos, FieldsWriter, DocumentWriter, TermInfosWriter, and so on. When these mini-indexes get merged, field definitions have to be reconciled. This merge stage is one of the bottlenecks which slow down interpreted-language ports of Lucene so severely, because there's a lot of object creation and destruction and a lot of method calls. KinoSearch uses a fixed-field-definition model. Before you add any documents to an index, you have to tell the index writer about all the possible fields you might use. When you add the first document, it creates the FieldInfos, FieldsWriter, etc, which persist throughout the life of the index writer. Instead of reconciling field definitions each time a document gets added, the field defs are defined as invariant for that indexing session. This is much faster, because there is far less object creation and destruction, and far less disk shuffling as well -- no segment merging, therefore no movement of stored fields, term vectors, etc. There are several possible ways to add dynamic fields back in to the fixed-field-def model. My main priority in doing so, if it proves to be necessary, is to keep table-alteration logic separate from insertion operations. Having the two conflated introduces needless complexity and computational expense at the back end. It's also just plain confusing -- if you accidentally forget to set OMIT_NORMS just once, all of a sudden that field is going to have norms for ever and ever amen. I think the user ought to have absolute control over field definitions. Inserting a field with a conflicting definition ought to be an error. Lucy is going to start with the KinoSearch merge model. I will do a better job of adding dynamic capabilities to it if you or someone else can articulate some specific examples of situations where static definitions would not suffice. I can think of a few tasks which would be slightly more convenient if new fields could be added on the fly, but maybe you can go one better and illustrate why dynamic field defs are essential. Marvin Humphrey Rectangular Research http://www.rectangular.com/ _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

