[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch updated LUCENE-2324:
----------------------------------

    Attachment: lucene-2324.patch

Finally a new version of the patch! (Sorry for keeping you guys waiting...)

It's not done yet, but it compiles (against realtime branch!) and >95% of the 
core test cases pass.

Work done in addition to last patch:

- Added DocumentsWriterPerThread
- Reimplemented big parts of DocumentsWriter
- Added DocumentsWriterThreadPool which is an extension point for different 
pool implementation.  The default impl is
  the ThreadAffinityDocumentsWriterThreadPool, which does what the old code did 
(try to assign a DWPT always to 
  the same thread).  It should be easy now to add Document#getSourceID() and 
another pool that can assign threads
  based on the sourceID.
- Initial implementation of sequenceIDs.  Currently they're only used to keep 
track of deletes and not for
  e.g. NRT readers yet.
- Lots of other changes here and there.

TODOs:

- Implement flush-by-ram logic
- Implement logic to discard deletes from the deletes buffer
- Finish sequenceID handling: IW#commit() and IW#close() should return ID of 
last flushed sequenceID
- Maybe change delete logic:  currently deletes are applied when a segment is 
flushed.  Maybe we can keep it this way
  in the realtime-branch though, because that's most likely what we want to do 
once the RAM buffer is searchable and
  deletes are cheaper as they can then be done in-memory before flush
- Fix unit tests (mostly exception handling and thread safety)
- New test cases, e.g. for sequenceID testing
- Simplify code:  In some places I copied code around, which can probably be 
further simplified
- I started removing some of the old setters/getters in IW which are not in 
IndexWriterConfig - need to finish that,
  or revert those changes and use a different patch
- Fix nocommits
- Performance testing

I'm planning to commit this soon to the realtime branch, even though it's 
obviously not done yet.  But it's a big 
patch and changes will be easier to track with an svn history.

> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2324
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2324
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to