I like this idea Hoss. Pure speculation to follow:
One of the things I have been playing with the idea of implementing
the Fieldable class as what I call DBFieldable. The idea being that
the Field is backed by a Database (with the appropriate select
statements, etc.) behind the scene. When you go to add Fieldables to
your document, you would just construct the appropriate DBFieldable
and pass it to add. I think this would solve a lot of people's
issues with combining DB's and Lucene, or at least the ability to
index the contents of a DB w/o having to first extract the content to
some intermediate form and also to search and get back things tied to
the DB. And I think it would give a nice, seamless connection
between Lucene and a DB.
However, one of my sticking points is on how to recreate the Document
on the search side and have it still be backed by the DB with a valid
connection. I could do this by hooking in more to the FieldsWriter/
Reader, but I wanted to avoid that so that this could live in contrib
w/o any changes to core. Another way would be to have it be Stored,
in which case it would duplicate the content into the Lucene index,
but that doesn't feel right. Yet another way is through flexible
indexing whereby you store document metadata on the document. Yet
another way is through Solr b/c you maintain metadata in the config
files so you could store info about the connection in the field
decoration.
Also, why couldn't we add a
doc(int n, FieldSelector fieldSelector, Document doc); to the
IndexReader/FieldsReader? The FieldsReader currently just does a new
Document() and then calls addField on it.
Is this reasonable? I'll see if I can work up a patch.
-Grant
On Jan 17, 2007, at 3:20 PM, Chris Hostetter wrote:
: A simple solution might be a 'classname' setup for the Document
: creation - like the default Directory implementation uses. As
long as
: the subclass has a no-arg ctor it is trivial.
a differnet tack on the topic: there is really no good reason why the
"Document" class used for indexing data should be the same as the
"Document" classs ued for returning results ... using the same
class in
this way results in all sort of confusio abotu which methods can be
called
in which context, and frequently leads people to assume they can do
safe
"round trips" of their Documents ... doing a search, modifying a field
value, and then re-inexing it -- not considering what happens to
non-STOREd fields or field/document boosts.
any work done to change the Document API to make it easier to subclass
should probably start with a seperation of these too completley
different
concepts.
One approach off the top of my head: make an IndexableDocument
interface
for clients to pass to IndexWriter and a "ReturnableDocument" class
for
IndexReader/IndexSearcher to return ... the existing Document class
can
subclass ReturnableDocument and impliment IndexableDocument, the
existing
methods with Document in their sig would be deprecated and replaced
with
methods using one of these new class names
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]