Moving to java-dev, I think this belongs here.
I've been looking at this problem some more today and reading
about ThreadLocals. It's easy to misuse them and end up with
memory leaks, apparently... and I think we may have this problem
here.
The problem here is that ThreadLocals are tied to Threads, and I
think the assumption in TermInfosReader and SegmentReader is that
(search) Threads are short-lived: they come in, scan the index, do
the search, return and die. In this scenario, their ThreadLocals
go to heaven with them, too, and memory is freed up.
But when Threads are long-lived, as they are in thread pools (e.g.
those in servlet containers), those ThreadLocals stay alive even
after a single search request is done. Moreover, the Thread is
reused, and the new TermInfosReader and SegmentReader put some new
values in that ThreadLocal on top of the old values (I think) from
the previous search request. Because the Thread still has
references to ThreadLocals and the values in them, the values
never get GCed.
I tried making ThreadLocals in TIR and SR static, I tried wrapping
values saved in TLs in WeakReference, I've tried using WeakHashMap
like in Robert Engel's FixedThreadLocal class from LUCENE-436, but
nothing helped. I thought about adding a public static method to
TIR and SR, so one could call it at the end of a search request
(think servlet filter) and clear the TL for the current thread,
but that would require making TIR and SR public and I'm not 100%
sure if it would work, plus that exposes the implementation
details too much.
I don't have a solution yet.
But do we *really* need ThreadLocal in TIR and SR? The only thing
that TL is doing there is acting as a per-thread storage of some
cloned value (in TIR we clone SegmentTermEnum and in SR we clone
TermVectorsReader). Why can't we just store those cloned values
in instance variables? Isn't whoever is calling TIR and SR going
to be calling the same instance of TIR and SR anyway, and thus get
access to those cloned values?
I'm really amazed that we haven't heard any reports about this
before. I am not sure why my application started showing this
leak only about 3 weeks ago. It is getting more pounded on than
before, so maybe that made the leak more obvious. My guess is
that more common Lucene usage is with a single index or a small
number of them, and with short-lived threads, where this problem
isn't easily visible. In my case I deal with a few tens of
thousands of indices and several parallel search threads that live
forever in the thread pool.
Any thoughts about this or possible suggestions for a fix?
Thanks,
Otis
----- Original Message ----
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, December 15, 2006 12:28:29 PM
Subject: Leaking org.apache.lucene.index.* objects
Hi,
About 2-3 weeks ago I emailed about a memory leak in my
application. I then found some problems in my code (I wasn't
closing IndexSearchers explicitly) and took care of those. Now I
see my app is still leaking memory - jconsole clearly shows the
"Tenured Gen" memory pool getting filled up until I hit the OOM,
but I can't seem to pin-point the source.
I found that a bunch or o.a.l.index.* objects are not getting
GCed, even though they should. For example:
$ jmap -histo:live 7825 | grep apache.lucene.index | head -20 |
sort -k2 -nr
num #instances #bytes class name
--------------------------------------
4: 1764840 98831040
org.apache.lucene.index.CompoundFileReader$CSIndexInput
5: 2119215 67814880 org.apache.lucene.index.TermInfo
7: 1112459 35598688 org.apache.lucene.index.SegmentReader
$Norm
9: 2132311 34116976 org.apache.lucene.index.Term
12: 1117897 26829528 org.apache.lucene.index.FieldInfo
13: 225340 18027200 org.apache.lucene.index.SegmentTermEnum
15: 589727 14153448 org.apache.lucene.index.TermBuffer
21: 86033 8718504 [Lorg.apache.lucene.index.TermInfo;
20: 86033 8718504 [Lorg.apache.lucene.index.Term;
23: 86120 7578560 org.apache.lucene.index.SegmentReader
26: 90501 5068056 org.apache.lucene.store.FSIndexInput
27: 86120 4822720 org.apache.lucene.index.TermInfosReader
33: 86130 3445200 org.apache.lucene.index.SegmentInfo
36: 87355 2795360 org.apache.lucene.store.FSIndexInput
$Descriptor
38: 86120 2755840 org.apache.lucene.index.FieldsReader
39: 86050 2753600
org.apache.lucene.index.CompoundFileReader
42: 46903 2251344 org.apache.lucene.index.SegmentInfos
43: 93778 2250672 org.apache.lucene.search.FieldCacheImpl
$Entry
45: 93778 1500448 org.apache.lucene.search.FieldCacheImpl
$CreationPlaceholder
47: 86510 1384160 org.apache.lucene.index.FieldInfos
I'm running my app in search-only mode - no adds or deletes.
The counts of these objects just keeps going up, even though I am
explicitly closing the IndexSearcher. I can see that file
descriptors _are_ freed up after searcher.close(), because lsof no
longer shows them, but the above objects just linger and
accumulate, even when I force GC via jconsole or via the profiler.
I thought maybe various *Readers are not getting close()d, but
I've double-checked all *Readers above, and they all seem to close
their IndexInput references. The static nested class
CompoundFileReader.CSIndexInput has a close() without any
implementation. At first I thought that was an omission, but
adding a close of the inner IndexInput there resulted in a search-
time error. I've added the lovely print debugging to various close
() methods and see those methods being called. I've added finalize
() with some print debugging to SegmentReader, TermInfosReader,
SegmentTermEnum, FieldsReader, and CompoundFileReader. All but
CFReader get finalized after a while.
My application is running as a webapp and has thousands of
separate indices. This means it's very multi-threaded and the
servlet container has a pool of threads that handle requests, and
each request may be for a different index. I cache IndexSearchers
for a while, and purge/close them every N minutes if they have
been idle more than M minutes.
It occurred to me last night that things like TermInfosReader and
SegmentReader are using ThreadLocal, and since threads are used in
a thread pool, and thus shared with requests handling searches
against different indices, it's not clear to me what happens with
object instances that are put in those ThreadLocals in such
scenario. Aren't things going to step on each others' toes?
TIR has close() and SR has doClose(), so I put <TL inst>.set(null)
there. This immediately got rid of those instances of
CompoundFileReader.CSIndexInput in my dev environment!!!! Yeeees!
But in my dev environment I tested my additions by slamming my app
against a *single* index. I took my modified Lucene to
production, and quickly saw all those o.a.l.index.* objects
accumulate again. I also see a lot of ThreadLocal's kids:
16: 419387 13420384 java.lang.ThreadLocal$ThreadLocalMap
$Entry
I *think* that points out to some issues with how that ThreadLocal
is used there, in a multi-threaded, multi-index environments.
I'm running JDK 6, and while this problem sounds a bit like
LUCENE-436, I'm not yet sure if it's the same thing. Because my
IndexSearchers (and thus all those o.a.l.index.* objects) are long-
lived, and threads are shared and reused for searching of other
indices, those close() and doClose() methods are not called at the
end of the request life-cycle, so at the end of the request those
TL instances will *still* have something in them. When their
thread is later reused for searching of another index, new data
will be put in them, but the old data will never be cleaned out! No?
It seems a bit odd, but with this ThreadLocals, shouldn't a multi-
threaded, multi-index webapp really have to "clean" those
ThreadLocal instances either before or at the end of the request?
I'm running out of ideas, and was wondering if anyone has any
thoughts about what could still be holding references to the above
classes. I have some 20-30MB memory snapshots (via YourKit) and
heap dumps (via jmap), if anyone is interested.
Thanks,
Otis
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]