[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012069#comment-13012069
 ] 

Simon Willnauer commented on LUCENE-2573:
-----------------------------------------

bq. How come we lost 'assert !bufferedDeletesStream.any();' inIndexWriter.java?
so this is tricky here. Since we are flushing concurrently this could false 
fail. The same assertion is in bufferedDeletesStream.prune(segmentInfos); which 
is synced. But another thread could sneak in between the prune and the any() 
check updating / deleting a document this could false fail. Or do I miss 
something here?

bq. Maybe, for stalling, instead of triggering by max RAM, we can take
this simple approach: if the number of flushing DWPTs ever exceeds one
plus number of active DWPTs, then we stall (and resume once it's below
again).

Awesome idea mike! I will do that!

{quote}
We should fix DefaultFlushPolicy to first pull the relevant config
from IWC (eg maxBufferedDocs), then check if that config is -1 or
not, etc., because IWC's config can be changed at any time (live)
so we may read eg 10000 at first and then -1 the second time.
{quote}

What you mean is we should check if we flush by Ram, DocCount etc. and only if 
so we check the live values for Ram, DocCount etc.?

{quote}
Hmm.... deletes are actually tricky, because somehow the FlushPolicy
needs access to the "global" deletes count (and also the to per-DWPT
deletes count). If a given DWPT has 0 buffered docs, then indeed the
buffered deletes in its pool doesn't matter. But, we do need to respect
the buffered deletes in the global pool...
{quote}
I think it does not make sense to check both the global count and the DWPT 
count against the same value. If we have a DWPT that exceeds it we also exceed 
globally or could it happen that a DWPT has more deletes than the global pool? 
Further if we observe the global pool and we exceed the limit do we flush all 
as written on the IWC documentation?

once we sort this out I upload a new patch with javadoc etc for flush policy. 
we seem to be close here man! 


> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to