>> I asked the question because I honestly wanted to see a concrete 
>> example of an application that couldn't be handled within the 
>> constraint of pre- defined fields.

My current application involves writing a web application which can
seach a ferret index built from a SQL database.

The idea is that the customer supplies SQLs for say customers,
suppliers, sales and puchases etc. The app then retrieves the rows from
the datasource and indexes using Ferret. The app provides both a html
website as an interface to the index, and also an XML api which can be
used by non browser clients.

The field set is quite different for each SQL [and is essentially out of
our control].

HTH,

Neville

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marvin Humphrey
Sent: Wednesday, 7 June 2006 7:08 AM
To: [email protected]
Subject: Re: [Ferret-talk] Proposal of some radical changes to API


On Jun 6, 2006, at 11:37 AM, Jan Prill wrote:

> this statement tempted me to jump in, even without using something 
> like dynamic field creation myself __right now__. But I have been - 
> especially on cms like projects badly in need for dynamic fields.
>
> That something isn't common in sql doesn't mean that there is no need 
> for this "something". This limitation of sql is the reason for doing 
> things like storing xml in relational dbs as well as the reason for 
> people using object dbs. I don't know if you had a look at dabble db, 
> but imagine something like this with a relational dbms. not funny! 
> Because of this they haven't even thought about using sql for dabble 
> db. So maybe it's just me but the argument:
> you can't do this in sql either doesn't sound too convincing...

Jan, I don't understand the requirement, and I'm not familiar with the
either dabble db or Rails, so neither that example nor the "models"
example Dave cited earlier has spoken to me.  I asked the question
because I honestly wanted to see a concrete example of an application
that couldn't be handled within the constraint of pre- defined fields.

Behind the scenes in Lucene is an elaborate, expensive apparatus for
dealing with dynamic fields.  Each document gets turned into its own
miniature inverted index, complete with its own FieldInfos,
FieldsWriter, DocumentWriter, TermInfosWriter, and so on.  When these  
mini-indexes get merged, field definitions have to be reconciled.   
This merge stage is one of the bottlenecks which slow down
interpreted-language ports of Lucene so severely, because there's a lot
of object creation and destruction and a lot of method calls.

KinoSearch uses a fixed-field-definition model.  Before you add any
documents to an index, you have to tell the index writer about all the
possible fields you might use.  When you add the first document, it
creates the FieldInfos, FieldsWriter, etc, which persist throughout the
life of the index writer.  Instead of reconciling field definitions each
time a document gets added, the field defs are defined as invariant for
that indexing session.  This is much faster, because there is far less
object creation and destruction, and far less disk shuffling as well --
no segment merging, therefore no movement of stored fields, term
vectors, etc.

There are several possible ways to add dynamic fields back in to the
fixed-field-def model.  My main priority in doing so, if it proves to be
necessary, is to keep table-alteration logic separate from insertion
operations.  Having the two conflated introduces needless complexity and
computational expense at the back end.  It's also just plain confusing
-- if you accidentally forget to set OMIT_NORMS just once, all of a
sudden that field is going to have norms for ever and ever amen.  I
think the user ought to have absolute control over field definitions.
Inserting a field with a conflicting definition ought to be an error.

Lucy is going to start with the KinoSearch merge model.  I will do a
better job of adding dynamic capabilities to it if you or someone else
can articulate some specific examples of situations where static
definitions would not suffice.  I can think of a few tasks which would
be slightly more convenient if new fields could be added on the fly, but
maybe you can go one better and illustrate why dynamic field defs are
essential.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to