[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Busch updated LUCENE-2324:
----------------------------------
Attachment: lucene-2324.patch
Finally a new version of the patch! (Sorry for keeping you guys waiting...)
It's not done yet, but it compiles (against realtime branch!) and >95% of the
core test cases pass.
Work done in addition to last patch:
- Added DocumentsWriterPerThread
- Reimplemented big parts of DocumentsWriter
- Added DocumentsWriterThreadPool which is an extension point for different
pool implementation. The default impl is
the ThreadAffinityDocumentsWriterThreadPool, which does what the old code did
(try to assign a DWPT always to
the same thread). It should be easy now to add Document#getSourceID() and
another pool that can assign threads
based on the sourceID.
- Initial implementation of sequenceIDs. Currently they're only used to keep
track of deletes and not for
e.g. NRT readers yet.
- Lots of other changes here and there.
TODOs:
- Implement flush-by-ram logic
- Implement logic to discard deletes from the deletes buffer
- Finish sequenceID handling: IW#commit() and IW#close() should return ID of
last flushed sequenceID
- Maybe change delete logic: currently deletes are applied when a segment is
flushed. Maybe we can keep it this way
in the realtime-branch though, because that's most likely what we want to do
once the RAM buffer is searchable and
deletes are cheaper as they can then be done in-memory before flush
- Fix unit tests (mostly exception handling and thread safety)
- New test cases, e.g. for sequenceID testing
- Simplify code: In some places I copied code around, which can probably be
further simplified
- I started removing some of the old setters/getters in IW which are not in
IndexWriterConfig - need to finish that,
or revert those changes and use a different patch
- Fix nocommits
- Performance testing
I'm planning to commit this soon to the realtime branch, even though it's
obviously not done yet. But it's a big
patch and changes will be easier to track with an svn history.
> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
> Key: LUCENE-2324
> URL: https://issues.apache.org/jira/browse/LUCENE-2324
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael Busch
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 4.0
>
> Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]