[ 
https://issues.apache.org/jira/browse/LUCENE-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103723#comment-13103723
 ] 

Michael McCandless commented on LUCENE-3425:
--------------------------------------------

Actually, NRTCachingDir does explicitly control the RAM usage in that
if its cache is using too much RAM then the next createOutput will go
straight to disk.

The one thing it does not do is evict the created files after they
close.  So, if you flush a big segment in IW, then NRTCachingDir will
keep those files in RAM even though its now over-budget.  (But the
next segment to flush will go straight to disk).

I think this isn't that big a problem in practice; ie, as long as you
set your IW RAM buffer to something not too large, or you ensure you
are opening a new NRT reader often enough that the accumulated docs
won't create a very large segment, then the excess RAM used by
NRTCachingDir will be bounded.

Still it would be nice to fix it so it evicts the files that set it
over, such that it's always below the budget once the outputs is
closed.  And I agree we should make it possible to have a single pool
for accounting purposes, so you can share this pool across multiple
NRTCachingDirs (and other things that use RAM).


> NRT Caching Dir to allow for exact memory usage, better buffer allocation and 
> "global" cross indices control
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3425
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3425
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Shay Banon
>
> A discussion on IRC raised several improvements that can be made to NRT 
> caching dir. Some of the problems it currently has are:
> 1. Not explicitly controlling the memory usage, which can result in overusing 
> memory (for example, large new segments being committed because refreshing is 
> too far behind).
> 2. Heap fragmentation because of constant allocation of (probably promoted to 
> old gen) byte buffers.
> 3. Not being able to control the memory usage across indices for multi index 
> usage within a single JVM.
> A suggested solution (which still needs to be ironed out) is to have a 
> BufferAllocator that controls allocation of byte[], and allow to return 
> unused byte[] to it. It will have a cap on the size of memory it allows to be 
> allocated.
> The NRT caching dir will use the allocator, which can either be provided (for 
> usage across several indices) or created internally. The caching dir will 
> also create a wrapped IndexOutput, that will flush to the main dir if the 
> allocator can no longer provide byte[] (exhausted).
> When a file is "flushed" from the cache to the main directory, it will return 
> all the currently allocated byte[] to the BufferAllocator to be reused by 
> other "files".

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to