[jira] Commented: (LUCENE-2047) IndexWriter should immediately resolve deleted docs to docID in near-real-time mode

Jason Rutherglen (JIRA) Fri, 13 Nov 2009 12:21:19 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777638#action_12777638
 ]


Jason Rutherglen commented on LUCENE-2047:
------------------------------------------

bq. It's strange that anything here is needed

I was obtaining the segment infos synced, had a small block of
unsynced code, then synced obtaining the sometimes defunct
readers. Fixed that part, then the errors went away!

bq. the sync(IW) is in fact necessary? 

I'm hoping we can do the deletes unsynced, which will make this
patch a net performance gain because we're allowing multiple
threads to delete concurrently (whereas today we're performing
them synced at flush time, i.e. the current patch is merely
shifting the term/query lookup cost from flush to
deleteDocument).

bq. buffer the deleted docIDs into DW's deletesInRAM.docIDs

I'll need to step through this, as it's a little strange to me
how DW knows the doc id to cache for a particular SR, i.e. how
are they mapped to an SR? Oh there's the DW.remapDeletes method?
Hmm...

Couldn't we save off a per SR BV for the update doc rollback
case, merging the special updated doc BV into the SR's deletes
on successful flush, throwing them away on failure?  Memory is
less of a concern with the paged BV from the pending LUCENE-1526
patch.  On a delete by query with many hits, I'm concerned about
storing too many doc id Integers in BufferedDeletes. 

Without syncing, new deletes could arrive, and we'd need to
queue them, and apply them to new segments, or newly merged
segments because we're not locking the segments. Otherwise some
deletes could be lost. 

A possible solution is, deleteDocument would synchronously add
the delete query/term to a queue per SR and return.
Asynchronously (i.e. in background threads) the deletes could be
applied. Merging would aggregate the incoming SR's queued
deletes (as they haven't been applied yet) into the merged
reader's delete queue. On flush we'd wait for these queued
deletes to be applied.  After flush, the queues would be clear
and we'd start over.  And because the delete queue is per reader,
it would be thrown away with the closed reader. 

> IndexWriter should immediately resolve deleted docs to docID in 
> near-real-time mode
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2047
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2047
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2047.patch, LUCENE-2047.patch
>
>
> Spinoff from LUCENE-1526.
> When deleteDocuments(Term) is called, we currently always buffer the
> Term and only later, when it's time to flush deletes, resolve to
> docIDs.  This is necessary because we don't in general hold
> SegmentReaders open.
> But, when IndexWriter is in NRT mode, we pool the readers, and so
> deleting in the foreground is possible.
> It's also beneficial, in that in can reduce the turnaround time when
> reopening a new NRT reader by taking this resolution off the reopen
> path.  And if multiple threads are used to do the deletion, then we
> gain concurrency, vs reopen which is not concurrent when flushing the
> deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2047) IndexWriter should immediately resolve deleted docs to docID in near-real-time mode

Reply via email to