[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

Michael Busch (JIRA) Sun, 28 Mar 2010 21:50:51 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850792#action_12850792
 ]


Michael Busch commented on LUCENE-2324:
---------------------------------------

{quote}
However, in the apply deletes
method how would we know which doc to stop deleting at? How
would the seq id map to a DW's doc id?
{quote}

We could have a global deletes-map that stores seqID -> DeleteAction.  
DeleteAction either contains a Term or a Query, and in addition an int 
"flushCount" (I'll explain in a bit what flushCount is used for.)

Each DocumentsWriterPerThread would have a growing array that contains each 
seqID that "affected" that DWPT, i.e. the seqIDs of *all* deletes, plus the 
seqIDs of the adds/updates performed by that particular DWPT.  One bit of a 
seqID in that array can indicate if it's a delete or add/update.

When it's time to flush we sort the array by increasing seqID and then loop a 
single time through it to find the seqIDs of all DeleteActions.  During the 
loop we count the number of adds/updates to determine the number of docs the 
DeleteActions affect.  After applying the deletes the DWPT makes a synchronized 
call to the global deletes-map and increments the flushCount int for each 
applied DeleteAction.  If flushCount==numThreadStates (== number of DWPT 
instances) the corresponding DeleteAction entry can be removed, because it was 
applied to all DWPT.

I think this should work?  Or is there a simpler solution?


> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2324
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2324
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2324.patch
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

Reply via email to