Hey folks,

LUCENE-3023 aims to land the considerably large
DocumentsWriterPerThread (DWPT) refactoring on trunk.
During the last weeks we have put lots of efforts into cleaning the
code up, fixing javadocs and run test locally
as well as on Jenkins. We reached the point where we are able to
create a final patch for review and land this
exciting refactoring on trunk very soon. I committed the CHANGES.TXT
entry (also appended below) a couple of minutes ago so from now on
we freeze the branch for final review (Robert can you create a new
"final" patch and upload to LUCENE-3023).
Any comments should go to [1] or as a reply to this email. If there is
no blocker coming up we plan to reintegrate the
branch and commit it to trunk early next week. For those who want some
background what DWPT does read: [2]

Note: this change will not change the index file format so there is no
need to reindex for trunk users. Yet, I will send a heads up next week
with an
overview of that has changed.

Simon

[1] https://issues.apache.org/jira/browse/LUCENE-3023
[2] http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/


* LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from
  DocumentsWriterPerThread:

  - IndexWriter now uses a DocumentsWriter per thread when indexing documents.
    Each DocumentsWriterPerThread indexes documents in its own private segment,
    and the in memory segments are no longer merged on flush.  Instead, each
    segment is separately flushed to disk and subsequently merged with normal
    segment merging.

  - DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a
    FlushPolicy.  When a DWPT is flushed, a fresh DWPT is swapped in so that
    indexing may continue concurrently with flushing.  The selected
    DWPT flushes all its RAM resident documents do disk.  Note: Segment flushes
    don't flush all RAM resident documents but only the documents private to
    the DWPT selected for flushing.

  - Flushing is now controlled by FlushPolicy that is called for every add,
    update or delete on IndexWriter. By default DWPTs are flushed either on
    maxBufferedDocs per DWPT or the global active used memory. Once the active
    memory exceeds ramBufferSizeMB only the largest DWPT is selected for
    flushing and the memory used by this DWPT is substracted from the active
    memory and added to a flushing memory pool, which can lead to temporarily
    higher memory usage due to ongoing indexing.

  - IndexWriter now can utilize ramBufferSize > 2048 MB. Each DWPT can address
    up to 2048 MB memory such that the ramBufferSize is now bounded by the max
    number of DWPT avaliable in the used DocumentsWriterPerThreadPool.
    IndexWriters net memory consumption can grow far beyond the 2048 MB limit if
    the applicatoin can use all available DWPTs. To prevent a DWPT from
    exhausting its address space IndexWriter will forcefully flush a DWPT if its
    hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be controlled
    via IndexWriterConfig and defaults to 1945 MB.
    Since IndexWriter flushes DWPT concurrently not all memory is released
    immediately. Applications should still use a ramBufferSize significantly
    lower than the JVMs avaliable heap memory since under high load multiple
    flushing DWPT can consume substantial transient memory when IO performance
    is slow relative to indexing rate.

  - IndexWriter#commit now doesn't block concurrent indexing while flushing all
    'currently' RAM resident documents to disk. Yet, flushes that occur while a
    a full flush is running are queued and will happen after all DWPT involved
    in the full flush are done flushing. Applications using multiple threads
    during indexing and trigger a full flush (eg call commmit() or open a new
    NRT reader) can use significantly more transient memory.

  - IndexWriter#addDocument and IndexWriter.updateDocument can block indexing
    threads if the number of active + number of flushing DWPT exceed a
    safety limit. By default this happens if 2 * max number available thread
    states (DWPTPool) is exceeded. This safety limit prevents applications from
    exhausting their available memory if flushing can't keep up with
    concurrently indexing threads.

  - IndexWriter only applies and flushes deletes if the maxBufferedDelTerms
    limit is reached during indexing. No segment flushes will be triggered
    due to this setting.

  - IndexWriter#flush(boolean, boolean) doesn't synchronized on IndexWriter
    anymore. A dedicated flushLock has been introduced to prevent multiple full-
    flushes happening concurrently.

  - DocumentsWriter doesn't write shared doc stores anymore.

  (Mike McCandless, Michael Busch, Simon Willnauer)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to