[ 
https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778335#action_12778335
 ] 

Michael McCandless commented on LUCENE-2047:
--------------------------------------------

Thinking more on this... I'm actually no longer convinced that this
change is worthwhile.

Net/net this will not improve the dps/qps throughput on a given fixed
hardware, because this is a zero sum game: the deletes must be
resolved one way or another.

Whether we do it in batch (as today), or incrementally/concurrently,
one at a time as they arrive, the same work must be done.  In fact,
batch should be less costly in practice since it clearly has temporal
locality in resolving terms -> postings, so on a machine whose IO
cache can't hold the entire index in RAM, bulk flushing should be
a win.

It's true that net latency of reopen will be reduced by being
incremental, but Lucene shouldn't aim to be able to reopen 100s of
times per second: I think that's a mis-feature (most apps don't need
it), and those that really do can and should use an approach like
Zoie.

Finally, one can always set the max buffered delete terms/docs to
something low, to achieve this same tradeoff.  It's true that won't
get you concurrent resolving of deleted Terms -> docIDs, but I bet in
practice that concurrency isn't necessary (ie the performance of a
single thread resolving all buffered deletes is plenty fast). 

If the reopen time today is plenty fast, especially if you configure
your writer to flush often, then I don't think we need incremental
resolving of the deletions?


> IndexWriter should immediately resolve deleted docs to docID in 
> near-real-time mode
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2047
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2047
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2047.patch, LUCENE-2047.patch
>
>
> Spinoff from LUCENE-1526.
> When deleteDocuments(Term) is called, we currently always buffer the
> Term and only later, when it's time to flush deletes, resolve to
> docIDs.  This is necessary because we don't in general hold
> SegmentReaders open.
> But, when IndexWriter is in NRT mode, we pool the readers, and so
> deleting in the foreground is possible.
> It's also beneficial, in that in can reduce the turnaround time when
> reopening a new NRT reader by taking this resolution off the reopen
> path.  And if multiple threads are used to do the deletion, then we
> gain concurrency, vs reopen which is not concurrent when flushing the
> deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to