[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850792#action_12850792 ]
Michael Busch commented on LUCENE-2324: --------------------------------------- {quote} However, in the apply deletes method how would we know which doc to stop deleting at? How would the seq id map to a DW's doc id? {quote} We could have a global deletes-map that stores seqID -> DeleteAction. DeleteAction either contains a Term or a Query, and in addition an int "flushCount" (I'll explain in a bit what flushCount is used for.) Each DocumentsWriterPerThread would have a growing array that contains each seqID that "affected" that DWPT, i.e. the seqIDs of *all* deletes, plus the seqIDs of the adds/updates performed by that particular DWPT. One bit of a seqID in that array can indicate if it's a delete or add/update. When it's time to flush we sort the array by increasing seqID and then loop a single time through it to find the seqIDs of all DeleteActions. During the loop we count the number of adds/updates to determine the number of docs the DeleteActions affect. After applying the deletes the DWPT makes a synchronized call to the global deletes-map and increments the flushCount int for each applied DeleteAction. If flushCount==numThreadStates (== number of DWPT instances) the corresponding DeleteAction entry can be removed, because it was applied to all DWPT. I think this should work? Or is there a simpler solution? > Per thread DocumentsWriters that write their own private segments > ----------------------------------------------------------------- > > Key: LUCENE-2324 > URL: https://issues.apache.org/jira/browse/LUCENE-2324 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael Busch > Assignee: Michael Busch > Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2324.patch > > > See LUCENE-2293 for motivation and more details. > I'm copying here Mike's summary he posted on 2293: > Change the approach for how we buffer in RAM to a more isolated > approach, whereby IW has N fully independent RAM segments > in-process and when a doc needs to be indexed it's added to one of > them. Each segment would also write its own doc stores and > "normal" segment merging (not the inefficient merge we now do on > flush) would merge them. This should be a good simplification in > the chain (eg maybe we can remove the *PerThread classes). The > segments can flush independently, letting us make much better > concurrent use of IO & CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org