Your right, more flexibility is needed, but it goes beyond just field loading in my mind. I think this is what Doug was getting at (at least partially) with http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard #12 although that focuses on Indexing, I think it should be considered for searching. I am not sure we should just continue adding more and more methods onto IndexReader. I think the 2.x move gives us an opportunity to refactor some of the things we think we can make better.

I am not sure you need 509 when you have Lazy loading. In my mind, you have the best of both worlds. You can get all the meta-info about all the stored fields on the Document w/o the penalty of loading the actual data.

My use case is below (my guess is this is quite common). Run a search, get back your hits and display summary information on the hits (i.e. the "small" fields). User picks the Hit they want to see more info on, go display the full document, including, most likely, the info in the really large stored fields (i.e the original document). To date, I have been storing this info elsewhere b/c of the loading penalty. With lazy loading, I don't need to do this. I can just defer loading until the second level access is needed and I never load it if the user doesn't ask for it. In the case where you only get a few smaller fields, you have to go back and get the document again when you want to display the contents of the large field.

Of course, there are several other use cases where you may only want certain fields, but I don't think there is much cost associated with loading small fields, just the large ones, so you can just make them lazy.


Yonik Seeley wrote:
On 3/31/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
        <https://issues.apache.org:443/jira/browse/LUCENE-509>
Yes, I'd personally find a way to retrieve just fields x,y, and z more
useful than lazy loading.

Thinking a little more, it would be nice if the field reading API was
opened up a little more so that multiple things could be done... even
construct different field/document objects (say a document
implementation that indexed the fields, etc).
That could be used to implement either lazy field loading, or loading
of specific fields.

The lazy loading alone doesn't really address LUCENE-509

I was thinking something along the lines of

// an IndexReader would call FieldReader methods for each
abstract class FieldReader {
  boolean readField(int fieldnum, String fieldName);  // users return
true if this field should be read.
  boolean stringField(int fieldnum, byte[] utf8);   // returns true to
keep reading next field
    OR
  boolean stringField(int fieldnum, String str);   // returns true to
keep reading next field
  boolean binaryField(int fieldnum, byte[] data);  // returns true to
keep reading next field
}

class IndexReader {
  // expert level API
  void readFields(int doc, FieldReader reader);
}

Just brainstorming so far...

-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--

Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to