Hi Denis,

I assume you are using Lucene 3.6.0, because in Lucene 3.6.1 the tracking of 
buffers using weak references is also done (although you cannot switch it off, 
unfortunately).

I can confirm what Mike says: Its all weak references and the overhead is maybe 
large, but it gets freed when memory gets low. In general its in most cases 
better to not allocate too much heap space for Lucene as this makes those maps 
larger and GC gets stressed. Only use as much memory so no OOM occurs and 
instead free al memory for the file system cache (so it has less paging). In 
that case, GC will clean up the concurrent maps faster.

In gernal: If you have an large index that changes seldom, but your query rate 
is very hight (like 200 queries per second), switch unmapping off (works since 
Lucene 4.2, see changelog for LUCENE-4740 - unfortunately the issue itself was 
closed for 4.4, 4.2 would be correct). In that case it's not needed to take 
care of unmapping and as index reopen rate is low, this does not waste 
resources.

But if your index changes often, there is no way around unmapping - or use 
NIOFSDir with NRTCachingDirectory for the optimization of near real time search 
with highly changing indexes!

Finally: The only way to fix this would be to make all codec structures like 
TermsEnum or DocsEnum, but also Scorer/DocIdSet/... implement Closeable. When 
you are done with Scorer you have to close it and the underlying cloned 
indexinput would be closed, too. In that case, the cloned IndexInput would be 
refcounted and unmapped when the last clone is closed. This is a larger change 
and might be an idea for Lucene 5.0 as "optimization". It would be a backwards 
break because all codecs and all queries would need to close correctly, but 
with our test frameworak and MockDirWrapper (and other MockFooBarWrappers) we 
could track this so all resources are closed.
We had TermEnum.close() up to Lucene 3.x, but it was dropped in 4.0 because it 
was never working in 3.x (nobody ever called close() on TermEnum or TermDocs 
instances.... :( ). With our new test framework this could be tracked now... So 
maybe worth a try?

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Wednesday, August 07, 2013 3:45 PM
> To: Lucene Users
> Subject: Re: WeakIdentityMap high memory usage
> 
> This map is used to track all cloned open files, which can be a very large
> number over time (each search will create maybe 3 of them).
> 
> This is done as a "best effort" to prevent SEGV (JVM dies) if you accidentally
> try to use an IndexReader after it was closed, while using MMapDirectory.
> 
> However, it's a weak map, which means when HEAP is tight GC should drop
> it.
> 
> So, this should not cause a real problem in "real life", even though it looks
> scary when you look at its RAM usage under a profiler.
> 
> If somehow it's causing "real life" problems, please report back!  But a 
> simple
> workaround is to call MMapDirectory.setUseUnmap(false) to turn off this
> tracking; this means you rely on GC to (eventually) unmap.
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Wed, Aug 7, 2013 at 2:45 AM, Denis Bazhenov <bazhe...@farpost.com>
> wrote:
> > We have upgraded from Lucene 3.6 to 4.4.On the production we faced high
> minor GC time. Heap dump showed that one of the biggest objects by size is
> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference. About 11
> million instances with about 377 megabytes of memory in total (this is not
> even retained size). Here is screenshot of the JProfiler output:
> https://dl.dropboxusercontent.com/u/16254496/Screen%20Shot%202013-
> 08-07%20at%205.35.22%20PM.png.
> >
> > The keys of the map are MMapIndexInput. What this map is for and how
> can I reduce it memory usage?
> > ---
> > Denis Bazhenov <bazhe...@farpost.com>
> > FarPost.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to