I really wish Doug would comment on all of these proposed changes...

I seems that after you account for all of the constraints (e.g. IndexReader must be current snashot...) you are going to end up right back where you started.

It propose that this work should be done in some sort of facade or "server facade", and this mucking with the core Lucene classes is making the API needlessly complex. It seems to me that we are making many of the current deterministic operations (except in the case of critical disk failures, or locking failure), non-deterministic (i.e. maybe my delete document call will fail).

I think we should all step back and decide what is is we are exactly trying to do. Maybe this has already been done and someone can point me to the appropriate documentation?

1. improve search performance?
2. improve indexing performance?
3. improve durability of index changes?
4. persistent search results?
5. improve concurrency and determinism of searching while indexing?

It seems that a lot of the current proposed patches are attempting to solve one or more of these problems, but there does not seem to be a general coherent approach. There also does not appear to be any list of constraints governing what will be considered a valid approach.

It is almost "well I need this little feature for something I am doing, so I propose ..."

It may be that to solve all of these "properly" requires Lucene 3.0 with a completely different API and infrastructure.

Just my thoughts.



On Jan 16, 2007, at 2:23 PM, Doron Cohen wrote:

Michael McCandless <[EMAIL PROTECTED]> wrote on 16/01/2007
12:13:47:

Ning Li wrote:
On 1/16/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
Good catch Ning!  And, I agree, when a reader plans to make
modifications to the index, I think the best solution is to require
that the reader has opened most recent "segments*_N" (be that a
snapshot or a checkpoint). Really a reader is actually a "writer" in this context. This means we need a way to open a reader against the
most recent checkpoint as well (I will add that).

This is very much consistent with how a reader now checks if it is
still current when someone first tries to change a del/norm: if it's
not still current (ie, another writer has written a new segments_N
file) then an IOException is raised with "IndexReader out of date and no longer valid for delete, undelete, or setNorm operations". I think
with explicit commits that same requirement & check would apply.

This means a reader can open a checkpoint for search. But the purpose
of "explicit commits" is that only snapshots are opened for search,
not checkpoints. Can we just trust applications won't open a
checkpoint for search? Or should we explicitly guard against it?

Ahh good point.

I think I'll add "openForWriting(*)" static methods to IndexReader.
These will acquire the write lock, and will open the latest
segments*_N (commit or checkpoint).  This way you can't open a
checkpoint unless there are no others writers on the index.

We could go further and have IndexSearcher not accept an IndexReader
opened against a checkpoint, but I'm included not to check for
(prevent) this, for starters.  I'd rather not preclude possibly
interesting future use cases too early.

Is this blocking applications that first perform a search, in order to
decide which docs to delete by docid?

Two other options in
http://article.gmane.org/gmane.comp.jakarta.lucene.devel/16581 ...?


Mike



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to