Of course, another option is to make all fields lazy all the time and the user never even needs to think about it. Need some strategy for when the IndexReader gets closed, but we have this in all cases.

Donovan Aaron wrote:
I've done a lot of work with Verity's search engine, and I like the way
they handle fields.  At query time you specify the fields you want
returned from matching documents.

Aaron

-----Original Message-----
From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 29, 2006 9:05 AM
To: java-dev@lucene.apache.org
Subject: Re: Lazy Field Loading

Hmmm, I guess I always thought of it as a property of the field that
user's would want to explicitly control.  I assumed that most fields
would not be lazy and a few would be.
Now that you have backed me up a bit on it (in a good way), I think it
could just as easily be a parameter that any field that is over a
specified size would be lazily loaded.  With this approach, I could see:

IndexReader.document(int docNumber, long maxFieldSizeToLoad);

and IndexReader.document(int docNum) would just call this new method
passing in some default value, say 2K or something.

Or, we could pass in an array of field names to be lazily loaded to,
something like

IndexReader.document(int docNumber, String [] fieldNamesToLoadLazy);

The current way I have it looks something like (with a few other
variations):
public Field(String name, String value, Store store, Index index,
LazyLoad lazy) and public Field(String name, byte[] value, Store store,
LazyLoad lazy)

for field constructors.

I am happy to do either way since the underlying mechanics are pretty
similar.  What do others think?

-Grant

Erik Hatcher wrote:
Lazy loaded fields will be a nice addition to Lucene. I'm curious why the flag is set at indexing time rather than it being something that is controlled during retrieval somehow. I'm not sure what that API would look like, but it seems its a decision to be addressed during searching and reading of an index rather than during indexing itself.

    Erik


On Mar 29, 2006, at 8:31 AM, Grant Ingersoll wrote:

I have a base implementation of lazy field loading that I am starting

to test and wanted to run my approach by everyone to hear their thoughts.

I have, as per Doug's suggestion from a while ago, created an interface named Fieldable that is implemented by Field and a new, private class, owned by FieldsReader. I have introduced an "enumerated" type to the Field class named LazyLoad (which can be YES

or NO, in the same spirit as Field.TermVector). Any place that used to take Field now takes Fieldable. This should be completely transparent and backward-compatible. The existing constructors of field all assume lazy to be off.

On creation of a Field, a user can pass in LazyLoad.YES or NO to a constructor that takes either a String value or a byte array (it does

not apply to the Reader constructors since they do not store their content). Indexing and writing of fields take place as normal, the only difference being there is an extra bit added to the field writing that marks the field as being lazy.

On reading in of the field, if it is Lazy, instead of reading in the value for the field and constructing a Field, construct a LazyField instance which takes in the pointer of the fieldsStream and the amount of data to read. This instance, since it is a private class of FieldsReader, maintains access to the fieldsStream. Thus, when a application goes to access the value of the field, we check to see if

it is has been loaded or not. If it has not, we load it using the fieldsStream, the pointer and the length to read.

Does anyone see any issues with this? I think it will only really pay off on large stored fields, but have not quantified it yet. My main concern is the semantics of the fieldsStream and whether that would be closed behind the back of the LazyField implementation. My understanding is that as long as the IndexReader is open, this stream
should also be open.  Is that correct?   What am I forgetting about?

If testing goes well, I should be able to button this up this week or

next and submit the patch.

--
Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--

Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to