I like this idea Hoss.  Pure speculation to follow:

One of the things I have been playing with the idea of implementing the Fieldable class as what I call DBFieldable. The idea being that the Field is backed by a Database (with the appropriate select statements, etc.) behind the scene. When you go to add Fieldables to your document, you would just construct the appropriate DBFieldable and pass it to add. I think this would solve a lot of people's issues with combining DB's and Lucene, or at least the ability to index the contents of a DB w/o having to first extract the content to some intermediate form and also to search and get back things tied to the DB. And I think it would give a nice, seamless connection between Lucene and a DB.

However, one of my sticking points is on how to recreate the Document on the search side and have it still be backed by the DB with a valid connection. I could do this by hooking in more to the FieldsWriter/ Reader, but I wanted to avoid that so that this could live in contrib w/o any changes to core. Another way would be to have it be Stored, in which case it would duplicate the content into the Lucene index, but that doesn't feel right. Yet another way is through flexible indexing whereby you store document metadata on the document. Yet another way is through Solr b/c you maintain metadata in the config files so you could store info about the connection in the field decoration.

Also, why couldn't we add a
doc(int n, FieldSelector fieldSelector, Document doc); to the IndexReader/FieldsReader? The FieldsReader currently just does a new Document() and then calls addField on it.

Is this reasonable?  I'll see if I can work up a patch.

-Grant



On Jan 17, 2007, at 3:20 PM, Chris Hostetter wrote:


: A simple solution might be a 'classname' setup for the Document
: creation - like the default Directory implementation uses. As long as
: the subclass has a no-arg ctor it is trivial.

a differnet tack on the topic: there is really no good reason why the
"Document" class used for indexing data should be the same as the
"Document" classs ued for returning results ... using the same class in this way results in all sort of confusio abotu which methods can be called in which context, and frequently leads people to assume they can do safe
"round trips" of their Documents ... doing a search, modifying a field
value, and then re-inexing it -- not considering what happens to
non-STOREd fields or field/document boosts.

any work done to change the Document API to make it easier to subclass
should probably start with a seperation of these too completley different
concepts.

One approach off the top of my head: make an IndexableDocument interface for clients to pass to IndexWriter and a "ReturnableDocument" class for IndexReader/IndexSearcher to return ... the existing Document class can subclass ReturnableDocument and impliment IndexableDocument, the existing methods with Document in their sig would be deprecated and replaced with
methods using one of these new class names



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to