I really wish Doug would comment on all of these proposed changes...
I seems that after you account for all of the constraints (e.g.
IndexReader must be current snashot...) you are going to end up right
back where you started.
It propose that this work should be done in some sort of facade or
"server facade", and this mucking with the core Lucene classes is
making the API needlessly complex. It seems to me that we are making
many of the current deterministic operations (except in the case of
critical disk failures, or locking failure), non-deterministic (i.e.
maybe my delete document call will fail).
I think we should all step back and decide what is is we are exactly
trying to do. Maybe this has already been done and someone can point
me to the appropriate documentation?
1. improve search performance?
2. improve indexing performance?
3. improve durability of index changes?
4. persistent search results?
5. improve concurrency and determinism of searching while indexing?
It seems that a lot of the current proposed patches are attempting to
solve one or more of these problems, but there does not seem to be a
general coherent approach. There also does not appear to be any list
of constraints governing what will be considered a valid approach.
It is almost "well I need this little feature for something I am
doing, so I propose ..."
It may be that to solve all of these "properly" requires Lucene 3.0
with a completely different API and infrastructure.
Just my thoughts.
On Jan 16, 2007, at 2:23 PM, Doron Cohen wrote:
Michael McCandless <[EMAIL PROTECTED]> wrote on 16/01/2007
12:13:47:
Ning Li wrote:
On 1/16/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
Good catch Ning! And, I agree, when a reader plans to make
modifications to the index, I think the best solution is to require
that the reader has opened most recent "segments*_N" (be that a
snapshot or a checkpoint). Really a reader is actually a
"writer" in
this context. This means we need a way to open a reader against
the
most recent checkpoint as well (I will add that).
This is very much consistent with how a reader now checks if it is
still current when someone first tries to change a del/norm: if
it's
not still current (ie, another writer has written a new segments_N
file) then an IOException is raised with "IndexReader out of
date and
no longer valid for delete, undelete, or setNorm operations". I
think
with explicit commits that same requirement & check would apply.
This means a reader can open a checkpoint for search. But the
purpose
of "explicit commits" is that only snapshots are opened for search,
not checkpoints. Can we just trust applications won't open a
checkpoint for search? Or should we explicitly guard against it?
Ahh good point.
I think I'll add "openForWriting(*)" static methods to IndexReader.
These will acquire the write lock, and will open the latest
segments*_N (commit or checkpoint). This way you can't open a
checkpoint unless there are no others writers on the index.
We could go further and have IndexSearcher not accept an IndexReader
opened against a checkpoint, but I'm included not to check for
(prevent) this, for starters. I'd rather not preclude possibly
interesting future use cases too early.
Is this blocking applications that first perform a search, in order to
decide which docs to delete by docid?
Two other options in
http://article.gmane.org/gmane.comp.jakarta.lucene.devel/16581 ...?
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]