Re: ThreadLocal causing memory leak with J2EE applications

Chris Lu Wed, 10 Sep 2008 10:38:59 -0700

Not holding searcher/reader. I did check that via memory snapshot.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 8:58 AM, Michael McCandless <
[EMAIL PROTECTED]> wrote:

>
> Chris,
>
> After you close your IndexSearcher/Reader, is it possible you're still
> holding a reference to it?
>
> Mike
>
>
> Chris Lu wrote:
>
>  Frankly I don't know why TermInfosReader.ThreadResources is not showing up
>> in the memory snapshot.
>>
>> Yes. It's been there for a long time. But let's see what's changed : A LRU
>> cache of termInfoCache is added.
>> I SegmentTermEnum previously would be released, since it's relatively a
>> simple object.
>> But with a cache added to the same class ThreadResources, which hold many
>> objects, with the threads still hanging around, the cache can not be
>> released, so in turn the SegmentTermEnum can not be released, so the
>> RAMDirectory can not be released.
>>
>> My test is too coupled with the software I am working on and not easy to
>> post here. But here is a similar case from another user:
>>
>>
>> -----------------------------------------------------------------------------------
>> i found a forum post from you here [1] where you mention that you
>> have a memory leak using the lucene ram directory. I'd like to ask you
>> if you already have resolved the problem and how you did it or maybe
>> you know where i can read about the solution. We are using
>> RAMDirectory too and figured out, that over time the memory
>> consumption raises and raises until the system breaks down but only
>> when we performing much index updates. if we only create the index and
>> don't do nothing except searching it, it work fine.
>>
>> -----------------------------------------------------------------------------------
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>> On Wed, Sep 10, 2008 at 2:45 AM, Michael McCandless <
>> [EMAIL PROTECTED]> wrote:
>>
>> I still don't quite understand what's causing your memory growth.
>>
>> SegmentTermEnum insances have been held in a ThreadLocal cache in
>> TermInfosReader for a very long time (at least since Lucene 1.4).
>>
>> If indeed it's the RAMDir's contents being kept "alive" due to this, then,
>> you should have already been seeing this problem before rev 659602.  And I
>> still don't get why your reference tree is missing the
>> TermInfosReader.ThreadResources class.
>>
>> I'd like to understand the root cause before we hash out possible
>> solutions.
>>
>> Can you post the sources for your load test?
>>
>> Mike
>>
>>
>> Chris Lu wrote:
>>
>> Actually, even I only use one IndexReader, some resources are cached via
>> the ThreadLocal cache, and can not be released unless all threads do the
>> close action.
>>
>> SegmentTermEnum itself is small, but it holds RAMDirectory along the path,
>> which is big.
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>> On Tue, Sep 9, 2008 at 10:43 PM, robert engels <[EMAIL PROTECTED]>
>> wrote:
>> You do not need a pool of IndexReaders...
>>
>> It does not matter what class it is, what matters is the class that
>> ultimately holds the reference.
>>
>> If the IndexReader is never closed, the SegmentReader(s) is never closed,
>> so the thread local in TermInfosReader is not cleared (because the thread
>> never dies). So you will get one SegmentTermEnum, per thread * per segment.
>>
>> The SegmentTermEnum is not a large object, so even if you had 100 threads,
>> and 100 segments, for 10k instances, seems hard to believe that is the
>> source of your memory issue.
>>
>> The SegmentTermEnum is cached by thread since it needs to enumerate the
>> terms, not having a per thread cache, would lead to lots of random access
>> when multiple threads read the index - very slow.
>>
>> You need to keep in mind, what if every thread was executing a search
>> simultaneously - you would still have 100x100 SegmentTermEnum instances
>> anyway !  The only way to prevent that would be to create and destroy the
>> SegmentTermEnum on each call (opening and seeking to the proper spot) -
>> which would be SLOW SLOW SLOW.
>>
>> On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:
>>
>> I have tried to create an IndexReader pool and dynamically create
>> searcher. But the memory leak is the same. It's not related to the Searcher
>> class specifically, but the SegmentTermEnum in TermInfosReader.
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>> On Tue, Sep 9, 2008 at 10:14 PM, robert engels <[EMAIL PROTECTED]>
>> wrote:
>> A searcher uses an IndexReader - the IndexReader is slow to open, not a
>> Searcher. And searchers can share an IndexReader.
>>
>> You want to create a single shared (across all threads/users) IndexReader
>> (usually), and create an Searcher as needed and dispose.  It is VERY CHEAP
>> to create the Searcher.
>>
>> I am fairly certain the javadoc on Searcher is incorrect.  The warning
>> "For performance reasons it is recommended to open only one IndexSearcher
>> and use it for all of your searches" is not true in the case where an
>> IndexReader is passed to the ctor.
>>
>> Any caching should USUALLY be performed at the IndexReader level.
>>
>> You are most likely using the "path" ctor, and that is the source of your
>> problems, as multiple IndexReader instances are being created, and thus the
>> memory use.
>>
>>
>> On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:
>>
>> On J2EE environment, usually there is a searcher pool with several
>> searchers open.
>> The speed to opening a large index for every user is not acceptable.
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>> On Tue, Sep 9, 2008 at 9:03 PM, robert engels <[EMAIL PROTECTED]>
>> wrote:
>> You need to close the searcher within the thread that is using it, in
>> order to have it cleaned up quickly... usually right after you display the
>> page of results.
>>
>> If you are keeping multiple searcher refs across multiple threads for
>> paging/whatever, you have not coded it correctly.
>>
>> Imagine 10,000 users - storing a searcher for each one is not going to
>> work...
>>
>> On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:
>>
>> Right, in a sense I can not release it from another thread. But that's the
>> problem.
>>
>> It's a J2EE environment, all threads are kind of equal. It's simply not
>> possible to iterate through all threads to close the searcher, thus
>> releasing the ThreadLocal cache.
>> Unless Lucene is not recommended for J2EE environment, this has to be
>> fixed.
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>>
>> On Tue, Sep 9, 2008 at 8:14 PM, robert engels <[EMAIL PROTECTED]>
>> wrote:
>> Your code is not correct. You cannot release it on another thread - the
>> first thread may creating hundreds/thousands of instances before the other
>> thread ever runs...
>>
>> On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:
>>
>> If I release it on the thread that's creating the searcher, by setting
>> searcher=null, everything is fine, the memory is released very cleanly.
>> My load test was to repeatedly create a searcher on a RAMDirectory and
>> release it on another thread. The test will quickly go to OOM after several
>> runs. I set the heap size to be 1024M, and the RAMDirectory is of size 250M.
>> Using some profiling tool, the used size simply stepped up pretty obviously
>> by 250M.
>>
>> I think we should not rely on something that's a "maybe" behavior,
>> especially for a general purpose library.
>>
>> Since it's a multi-threaded env, the thread that's creating the entries in
>> the LRU cache may not go away quickly(actually most, if not all, application
>> servers will try to reuse threads), so the LRU cache, which uses thread as
>> the key, can not be released, so the SegmentTermEnum which is in the same
>> class can not be released.
>>
>> And yes, I close the RAMDirectory, and the fileMap is released. I verified
>> that through the profiler by directly checking the values in the snapshot.
>>
>> Pretty sure the reference tree wasn't like this using code before this
>> commit, because after close the searcher in another thread, the RAMDirectory
>> totally disappeared from the memory snapshot.
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>> On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless <
>> [EMAIL PROTECTED]> wrote:
>>
>> Chris Lu wrote:
>>
>> The problem should be similar to what's talked about on this discussion.
>> http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
>>
>> The "rough" conclusion of that thread is that, technically, this isn't a
>> memory leak but rather a "delayed freeing" problem.  Ie, it may take longer,
>> possibly much longer, than you want for the memory to be freed.
>>
>>
>> There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
>> May23,2008)
>>
>> This patch brings in a ThreadLocal cache to TermInfosReader.
>>
>> One thing that confuses me: TermInfosReader was already using a
>> ThreadLocal to cache the SegmentTermEnum instance.  What was added in this
>> commit (for LUCENE-1195) was an LRU cache storing Term -> TermInfo
>> instances.  But it seems like it's the SegmentTermEnum instance that you're
>> tracing below.
>>
>>
>> It's usually recommended to keep the reader open, and reuse it when
>> possible. In a common J2EE application, the http requests are usually
>> handled by different threads. But since the cache is ThreadLocal, the
>> cache
>> are not really usable by other threads. What's worse, the cache can not be
>> cleared by another thread!
>>
>> This leak is not so obvious usually. But my case is using RAMDirectory,
>> having several hundred megabytes. So one un-released resource is obvious
>> to
>> me.
>>
>> Here is the reference tree:
>> org.apache.lucene.store.RAMDirectory
>>  |- directory of org.apache.lucene.store.RAMFile
>>   |- file of org.apache.lucene.store.RAMInputStream
>>       |- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
>>           |- input of org.apache.lucene.index.SegmentTermEnum
>>               |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry
>>
>> So you have a RAMDir that has several hundred MB stored in it, that you're
>> done with yet through this path Lucene is keeping it alive?
>>
>> Did you close the RAMDir?  (which will null its fileMap and should also
>> free your memory).
>>
>> Also, that reference tree doesn't show the ThreadResources class that was
>> added in that commit -- are you sure this reference tree wasn't before the
>> commit?
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: ThreadLocal causing memory leak with J2EE applications

Reply via email to