Doron Cohen wrote:
Michael McCandless <[EMAIL PROTECTED]> wrote on 16/01/2007
12:13:47:
Ning Li wrote:
On 1/16/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
Good catch Ning! And, I agree, when a reader plans to make
modifications to the index, I think the best solution is to require
that the reader has opened most recent "segments*_N" (be that a
snapshot or a checkpoint). Really a reader is actually a "writer" in
this context. This means we need a way to open a reader against the
most recent checkpoint as well (I will add that).
This is very much consistent with how a reader now checks if it is
still current when someone first tries to change a del/norm: if it's
not still current (ie, another writer has written a new segments_N
file) then an IOException is raised with "IndexReader out of date and
no longer valid for delete, undelete, or setNorm operations". I think
with explicit commits that same requirement & check would apply.
This means a reader can open a checkpoint for search. But the purpose
of "explicit commits" is that only snapshots are opened for search,
not checkpoints. Can we just trust applications won't open a
checkpoint for search? Or should we explicitly guard against it?
Ahh good point.
I think I'll add "openForWriting(*)" static methods to IndexReader.
These will acquire the write lock, and will open the latest
segments*_N (commit or checkpoint). This way you can't open a
checkpoint unless there are no others writers on the index.
We could go further and have IndexSearcher not accept an IndexReader
opened against a checkpoint, but I'm included not to check for
(prevent) this, for starters. I'd rather not preclude possibly
interesting future use cases too early.
Is this blocking applications that first perform a search, in order to
decide which docs to delete by docid?
I don't think we're preventing this use case, even if we decide to
guard against "searching on a checkpoint" (which I think we shouldn't
do just yet).
If you do an explicit commit from your writer, close it, then open a
reader, you can run searches and delete the resulting docids. This is
in fact Solr's approach today (a commit is forced if you do a
deleteByQuery).
Two other options in
http://article.gmane.org/gmane.comp.jakarta.lucene.devel/16581 ...?
Re those 2 ideas: I do agree the whole division of certain kinds of
index changes into a reader and other ones into a writer, is confusing
to our users. I think our ideal eventual solution is a single "grand
unified" Index class that efficiently does all things that IndexWriter
and IndexReader do today. (I think this is closest to your 2nd option
in that link).
I think the "support deleteDocuments in IndexWriter" (LUCENE-565) is
an awesome first step. But I think these steps are separate from
enabling explicit commits. Explicit commits should allow LUCENE-565
to have a more efficient implementation, but we should still work
through them separately.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]