Very interesting discussion. It matches some ideas I had about how Lucene works, I just wasn't sure of their relevance, only trying to hack Lucene for few months.
I love the idea of decoupling the document being indexed, and the document being extracted from the index. It joins also some comments in the code of IndexReader : //When we convert to JDK 1.5 make this Set<String> public abstract Document document(int n, FieldSelector fieldSelector) throws IOException; Here it shouldn't be Set<String>, but sort of Set<Fieldable>. But the idea is here. The idea behind LUCENE-778 was just allowing custom document indexing, just as Grant's idea of DBFieldable. Then there is the document extracting part. I have done some work on it with LUCENE-662. Some ideas in this thread talked about allowing changing the default instanciation of Document by some system property setup. I don't like this idea because it doesn't allow going forward a generic Java way of typing classes. It will work, but I think we can do better. The basic idea is providing a document factory, which can be parametrized: a sort of DocumentFactory<ResultDocument>. Then this factory is used by the FieldReader<ResultDocument> and provide some filled field instances of ResultDocument. And finnaly the IndexReader<ResultDocument> will provide ResultDocument instance. From the user point of view of Lucene, this would be fantastic. Instanciating an IndexReader<MyAppDocument>, and then get some MyAppDocument without any cast to do. I aslo tried to go even further in decoupling indexing/searching from storing/extracting. On one hand, specify what to index and how, using curent Document design with Field. On the other hand, specify what to store and how, allowing to store it in a DB. So adding a document to the index is creating a Document with only indexed fields, and some document data, not necessarily organized by fields. Then the DocumentWriter will index fields, as it does today, and with a provided implementation of a DocumentDataStorage, store the document data. At the reverse, when extracting a document from an index, in fact it will extract only the docuement data with the same implementation of the DocumentDataStorage. Then I realized that Lucene allows it already. With a such design, Lucene will have to keep inside a mapping between the document id and a document data id provided by a DocumentDataStorage. And in fact, this is simply, with the current Lucene, a simple special stored field added to the document. The only advantage a such design has is that Lucene will provide very flexible tools to store data. It would allow two different merge policy between some index segment and store segment; so there will be an extracted merge policy from the IndexWriter, abstraction of the segment notion and so on. But I don't think this is the goal of Lucene, which is indexing and searching. (or maybe for a Lucene 3, 4 ? %) ) BTW, providing customized implementation of Document will be cool. In my application, I have just done a wrapper, which is simply instanciating with a special contructor : MyAppDocument(Document doc). For LUCENE-662, I have tried to make it Java-5-generic-type aware. I have not proposed a patch because Lucene doesn't yet support Java-5. If people are interested, just to see how it would be, I can finish making it proper and publish it in Jira. Nicolas Le Vendredi 19 Janvier 2007 23:04, Grant Ingersoll a écrit : > Yes, duh. Was writing and not thinking! > > On Jan 19, 2007, at 3:49 PM, Chris Hostetter wrote: > > : Yes, I was suggesting this in light of your suggestions :-) > > > > Document > > > > : would have to be non-final for this to work. > > > > No ... Document as it is with all of it's methods for both being > > indexed > > and for being returned from a search could still be final -- it > > would just > > need to impliment these new interfaces. the key would be having new > > nethods in IndexReader/IndexWriter/IndexSearcher that used these new > > methods. > > > > -Hoss > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > -------------------------- > Grant Ingersoll > Center for Natural Language Processing > http://www.cnlp.org > > Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ > LuceneFAQ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]