[
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008776#comment-13008776
]
Michael McCandless commented on LUCENE-2573:
--------------------------------------------
* I think once we sync up to trunk again, the FP should hold the
IW's config instance, and pull settings "live" from it? Ie this
way we keep our live changes to flush-by-RAM. Also, Healthiness
(it won't get updates to RAM buffer now).
* Should we rename *ByRAMFP --> *ByRAMOrDocCountFP? Since it "ors"
docCount and RAM usage trigger right? Oh, I see, not quite -- it
requires RAM buffer be set. I think we should relax that? Ie a
single flush policy (the default) flushes by either/or?
* Shouldn't these flush policies also trigger by
maxBufferedDelCount?
* Maybe FP.init should throw IllegalStateExc not IllegalArgExc?
(Because, no arg is allowed once the "state" of FP has already
been init'ed).
* Probably FP.writer should be a SetOnce?
* Hmm we still have a FlushPolicy.message? Can't we just make IW
protected and then FlushPolicy impl can call IW.message? (And
also remove FP.setInfoStream).
* Is IW.FlushControl not really used anymore? We should remove it?
* I still think LW should be 1.0 of your RAM buffer. Ie, IW will
start flushing once that much RAM is in use.
* I still see "synchronized (docWriter.flushControl) {" in
IndexWriter
* We should jdoc that IWC.setFlushPolicy takes effect only on init
of IW?
* Add "for testing only" comment to IW.getDocsWriter?
* I wonder whether we should convey "what changed" to the FP? EG,
we can 1) buffer a new del term, 2) add a new doc, 3) both
(updateDocument). It could be we have onUpdate, onAdd, onDelete?
Or maybe we keep single method but rename to onChange? Ie, it's
called because *something* about the incoming DWPT has changed.
* The flush policy shouldn't have to compute "delta" RAM like it
does now? Actually why can't it just call
flushControl.activeBytes(), and we ensure the delta was already
folded into that? Ie we'd call commmitPerThreadBytes before
FP.visit. (Then commitPerThreadBytes wouldn't ever add to
flushBytes, which is sort of spooky -- like flushBytes should get
incr'd only when we pull a DWPT out for flushing).
* I don't think we should ever markAllWritersPending, ie, that's
not the right "reaction" when flushing is too slow (eg you're on a
slow hard drive) since over time this will result in flushing lots
of tiny segments unnecessarily. A better reaction is to stall the
incoming threads; this way the flusher threads catch up, and once
you resume, then the small DPWTs have a chance to get big before
they are flushed.
* Misspelled: markLargesWriterPending -> markLargestWriterPending
> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael Busch
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a
> tiered approach:
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values
> explicitly using total values (e.g. low water mark at 120MB, high water mark
> at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB()
> config method and use something like 90% and 110% for the water marks?
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]