[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

Michael McCandless (JIRA) Thu, 20 Jan 2011 09:11:14 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984285#action_12984285
 ]


Michael McCandless commented on LUCENE-2324:
--------------------------------------------

OK I think Michael's example can be solved, with a small change to the delete 
buffering.

When a delete arrives, we should buffer in each DWPT, but also buffer into the 
"global" deletes pool (held in DocumentsWriter).

Whenever any DWPT is flushed, that global pool is pushed.

Then, the buffered deletes against each DWPT are carried (as usual) along w/ 
the segment that's flushed from that DWPT, but those buffered deletes *only* 
apply to the docs in that one segment.

The pushed deletes from the global pool apply to all prior segments (ie, they 
"coalesce").

This way, the deletes that will be applied to the already flushed segments are 
aggressively pushed.

Separately, I think we should relax the error semantics for updateDocument: if 
an aborting exception occurs (eg disk full while flushing a segment), then it's 
possible that the "delete" from an updateDocument will have applied but the 
"add" did not.  Outside of error cases, of course, updateDocument will continue 
to be atomic (ie a commit() can never split the delete & add).  Then the 
updateDocument case is handled as just an [atomic wrt flush] add & delete.

> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2324
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2324
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
> LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, 
> lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

Reply via email to