[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906798#action_12906798
 ] 

Jason Rutherglen commented on LUCENE-2573:
------------------------------------------

bq. shouldn't tiered flushing take care of this

Faulty thinking for a few minutes.

{quote}but this won't be most efficient, in general? Ie we could end up 
creating tiny segments depending on luck-of-the-thread-scheduling?{quote}

True.  Instead, we may want to simply not-flush the current DWPT if it is in 
fact not the highest RAM user.  When addDoc is called on the thread with the 
highest RAM usage, we can then flush it.

bq. there's no longer a need to track per-doc pending RAM

I'll remove it from the code.

{quote}If a buffer is not in the pool (ie not free), then it's in use and we 
count that as RAM used{quote}

Ok, I'll make the change.  

{quote}we have to track net allocated, in order to trim the buffers (drop them, 
so GC can reclaim) when we are over the .setRAMBufferSizeMB{quote}

I haven't seen this in the realtime branch.  Reclamation of extra allocated 
free blocks may need to be reimplemented.  

I'll increment num bytes used when a block is returned for use.

On this topic, do you have any thoughts yet about how to make the block pools 
concurrent?  I'm still leaning towards a random access file (seek style) 
interface because this is easy to make concurrent, and hides the underlying 
block management mechanism, rather than directly exposes it like today, which 
can lend itself to problematic usage in the future.

> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to