Re: [jira] Commented: (LUCENE-778) Allow overriding a Document

Grant Ingersoll Fri, 19 Jan 2007 07:11:03 -0800

I like this idea Hoss.  Pure speculation to follow:

One of the things I have been playing with the idea of implementingthe Fieldable class as what I call DBFieldable. The idea being thatthe Field is backed by a Database (with the appropriate selectstatements, etc.) behind the scene. When you go to add Fieldables toyour document, you would just construct the appropriate DBFieldableand pass it to add. I think this would solve a lot of people'sissues with combining DB's and Lucene, or at least the ability toindex the contents of a DB w/o having to first extract the content tosome intermediate form and also to search and get back things tied tothe DB. And I think it would give a nice, seamless connectionbetween Lucene and a DB.

However, one of my sticking points is on how to recreate the Documenton the search side and have it still be backed by the DB with a validconnection. I could do this by hooking in more to the FieldsWriter/Reader, but I wanted to avoid that so that this could live in contribw/o any changes to core. Another way would be to have it be Stored,in which case it would duplicate the content into the Lucene index,but that doesn't feel right. Yet another way is through flexibleindexing whereby you store document metadata on the document. Yetanother way is through Solr b/c you maintain metadata in the configfiles so you could store info about the connection in the fielddecoration.


Also, why couldn't we add a

doc(int n, FieldSelector fieldSelector, Document doc); to theIndexReader/FieldsReader? The FieldsReader currently just does a newDocument() and then calls addField on it.


Is this reasonable?  I'll see if I can work up a patch.

-Grant



On Jan 17, 2007, at 3:20 PM, Chris Hostetter wrote:

: A simple solution might be a 'classname' setup for the Document
: creation - like the default Directory implementation uses. Aslong as
: the subclass has a no-arg ctor it is trivial.

a differnet tack on the topic: there is really no good reason why the
"Document" class used for indexing data should be the same as the
"Document" classs ued for returning results ... using the sameclass inthis way results in all sort of confusio abotu which methods can becalledin which context, and frequently leads people to assume they can dosafe
"round trips" of their Documents ... doing a search, modifying a field
value, and then re-inexing it -- not considering what happens to
non-STOREd fields or field/document boosts.

any work done to change the Document API to make it easier to subclass
should probably start with a seperation of these too completleydifferent
concepts.
One approach off the top of my head: make an IndexableDocumentinterfacefor clients to pass to IndexWriter and a "ReturnableDocument" classforIndexReader/IndexSearcher to return ... the existing Document classcansubclass ReturnableDocument and impliment IndexableDocument, theexistingmethods with Document in their sig would be deprecated and replacedwith
methods using one of these new class names



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-778) Allow overriding a Document

Reply via email to