[
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Simon Willnauer updated LUCENE-2573:
------------------------------------
Attachment: LUCENE-2573.patch
next iteration containing a large number of refactorings.
* I moved all responsibilities related to flushing including synchronization
into the DocsWriterSession and renamed it to DocumentsWriterFlushControl.
* DWFC now only tracks active and flush bytes since the relict from my initial
patch where pending memory was tracked is not needed anymore.
* DWFC took over all synchronization so there is not synchronized
(flushControl) {...} in DocumentsWriter anymore. Seem way cleaner too though.
* Healthiness now blocks once we reach 2x maxMemory and SingleTierFlushPolicy
uses 0.9 maxRam as low watermark and 2x low watermark as its HW to flush all
threads. The multi tier one is still unchanged and flushes in linear steps from
0.9 to 1.10 x maxRam. We should actually test if this does better worse than
the single tier FP.
* FlushPolicy now has only a visit method and uses the IW.message to write to
info stream.
* ThreadState now holds a boolean flag that indicates if a flush is pending
which is synced and written by DWFC. States[] is gone in DWFC.
* FlushSpecification is gone and DWFC returns DWPT upon checkoutForFlush. Yet,
I still track the mem for the flushing DWPT seperatly since the
DWPT#bytesUsed() changes during flush and I don't want to rely on that this
doesn't change. As a nice side-effect I can check if a checked out DWPT is
passed to doAfterFlush and assert on that.
next steps here are benchmarking and getting good defaults for the flush
policies. I think we are close though.
> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael Busch
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a
> tiered approach:
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values
> explicitly using total values (e.g. low water mark at 120MB, high water mark
> at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB()
> config method and use something like 90% and 110% for the water marks?
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]