[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003909#comment-13003909
 ] 

Michael McCandless commented on LUCENE-2573:
--------------------------------------------

Not done reviewing the patch but here's some initial feedback:


Very cool (and super advanced) that this adds a FlushPolicy!  But for
"normal" usage we go and make either DocCountFP or TieredFP, depending
on whether IWC is flushing by docCount, RAM or both right?  Ie one
normally need not make their own FlushPolicy.

Maybe rename TieredFP -> ByRAMFP?  Also, I'm not sure we need the N
tiers?  I suspect that may flush too heavily?  Can we instead simplify
it and have only the low and high water marks?  So we flush when
active RAM is over low water mark?  (And we stall if active + flushing
RAM exceeds high water mark).

Can we rename isHealthy to isStalled (ie, invert it)?

I'm still unsure we should even include any healthy check APIs.  This
is an exceptional situation and I don't think we need API exposure for
it?  If apps really want to, they can turn on infoStream (we should
make sure "stalling" is logged, just like it is for merging) and
debug from there?

Maybe rename pendingBytes to flushingBytes?  Or maybe
flushPendingBytes?  (Just to make it clear what we are pending on...).

Maybe rename FP.printInfo(String msg) --> FP.message?  (Consistent w/
our other classes).

I wonder if FP.findFlushes should be renamed to something like
FP.visit, and return void?  Ie, it's called for its side effects of
marking DWPTs for flushing, right?  Separately, whether or not this
thread will go and flush a DWPT is for IW to decide?  (Like it could
be this thread didn't mark any new flush required, but it should go
off and pull a DWPT previously marked by another thread).  So then IW
would have a private volatile boolean recording whether any active
DWPTs have flushPending.


> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to