[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006871#comment-13006871
 ] 

Simon Willnauer commented on LUCENE-2573:
-----------------------------------------

bq. I still see a healtiness (mis-spelled) in DW.
ugh. I will fix
{quote}
I'd rather not have the stalling/healthiness be baked into the API, at
all. Can we put the hijack logic entirely private in the flush-by-ram
policies? (Ie remove isStalled()/hijackThreadsForFlush()).
{quote}

I agree for the hijack part but the isStalled is something I might want to 
control. I mean we can still open it up eventually so rather make it private 
for now but keep a not on in. 

{quote}
Can we move FlushSpecification out of FlushPolicy? Ie, it's a private
impl detail of DW right? (Not part of FlushPolicy's API). Actually
why do we need it? Can't we just return the DWPT?
{quote}

it currently holds the ram usage for that DWPT when it was checked out so that 
I can reduce the flushBytes accordingly. We can maybe get rid of it entirely 
but I don't want to rely on the DWPT bytesUsed() though.
We can certainly move it out - this inner class is a relict though.

bq. Why do we have a separate DocWriterSession? Can't this be absorbed
into DocWriter?

I generally don't like cluttering DocWriter and let it grow like IW. 
DocWriterSession might not be the ideal name for this class but its really a 
ram tracker for this DW. Yet, we can move out some parts that do not directly 
relate to mem tracking. Maybe DocWriterBytes?

bq. Be careful defaulting TermsHash.trackAllocations to true – eg term
vectors wants this to be false.

I need to go through the IndexingChain and check carefully where to track 
memory anyway. I haven't got to that yet but good that you mention it that one 
could easily get lost.





bq. Instead of FlushPolicy.message, can't the policy call DW.message?
I don't want to couple that API to DW. What would be the benefit beside from 
saving a single method?
{quote}
On the by-RAM flush policies... when you hit the high water mark, we
should 1) 
flush all DWPTs and 2) stall any other threads.
{quote}
Well I am not sure if we should do that. I don't really see why we should 
forcefully stop the world here. Incoming threads will pick up a flush 
immediately and if we have enough resources to index further why should we wait 
until all DWPT are flushed. if we stall I fear that we could queue up threads 
that could help flushing while stalling would simply stop them doing anything, 
right? You can still control this with the healthiness though. We currently do 
flush all DWPT btw. once we hit the HW. 

{quote}
Why do we dereference the DWPTs with their ord? EG, can't we just
store their 
"state" (active or flushPending) on the DWPT instead of in
a separate states[]?
{quote}
That is definitely an option. I will give that a go.
{quote}
Do we really need FlushState.Aborted? And if not... do we really need

FlushState (since it just becomes 2 states, ie, Active or Flushing,
which I 
think is then redundant w/ flushPending boolean?).
{quote}
this needs some more refactoring I will attach another iteration
{quote}
I think the default low water should be 1X of your RAM buffer? And
high water 
maybe 2X? (For both flush-by-RAM policies).
{quote}
hmm, I think we need to revise the maxRAMBufferMB Javadoc anyway so we have all 
the freedom to do whatever we want. yet, I think we should try to keep the RAM 
consumption similar to what it would have used in a previous release. So if we 
say HW is 2x then suddenly some apps might run out of memory. I am not sure if 
we should do that or rather stick to the 90% to 110% for now.  We need to find 
good defaults for this anyway.


> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to