Not holding searcher/reader. I did check that via memory snapshot. -- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding!
On Wed, Sep 10, 2008 at 8:58 AM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Chris, > > After you close your IndexSearcher/Reader, is it possible you're still > holding a reference to it? > > Mike > > > Chris Lu wrote: > > Frankly I don't know why TermInfosReader.ThreadResources is not showing up >> in the memory snapshot. >> >> Yes. It's been there for a long time. But let's see what's changed : A LRU >> cache of termInfoCache is added. >> I SegmentTermEnum previously would be released, since it's relatively a >> simple object. >> But with a cache added to the same class ThreadResources, which hold many >> objects, with the threads still hanging around, the cache can not be >> released, so in turn the SegmentTermEnum can not be released, so the >> RAMDirectory can not be released. >> >> My test is too coupled with the software I am working on and not easy to >> post here. But here is a similar case from another user: >> >> >> ----------------------------------------------------------------------------------- >> i found a forum post from you here [1] where you mention that you >> have a memory leak using the lucene ram directory. I'd like to ask you >> if you already have resolved the problem and how you did it or maybe >> you know where i can read about the solution. We are using >> RAMDirectory too and figured out, that over time the memory >> consumption raises and raises until the system breaks down but only >> when we performing much index updates. if we only create the index and >> don't do nothing except searching it, it work fine. >> >> ----------------------------------------------------------------------------------- >> >> -- >> Chris Lu >> ------------------------- >> Instant Scalable Full-Text Search On Any Database/Application >> site: http://www.dbsight.net >> demo: http://search.dbsight.com >> Lucene Database Search in 3 minutes: >> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >> DBSight customer, a shopping comparison site, (anonymous per request) got >> 2.6 Million Euro funding! >> >> On Wed, Sep 10, 2008 at 2:45 AM, Michael McCandless < >> [EMAIL PROTECTED]> wrote: >> >> I still don't quite understand what's causing your memory growth. >> >> SegmentTermEnum insances have been held in a ThreadLocal cache in >> TermInfosReader for a very long time (at least since Lucene 1.4). >> >> If indeed it's the RAMDir's contents being kept "alive" due to this, then, >> you should have already been seeing this problem before rev 659602. And I >> still don't get why your reference tree is missing the >> TermInfosReader.ThreadResources class. >> >> I'd like to understand the root cause before we hash out possible >> solutions. >> >> Can you post the sources for your load test? >> >> Mike >> >> >> Chris Lu wrote: >> >> Actually, even I only use one IndexReader, some resources are cached via >> the ThreadLocal cache, and can not be released unless all threads do the >> close action. >> >> SegmentTermEnum itself is small, but it holds RAMDirectory along the path, >> which is big. >> >> -- >> Chris Lu >> ------------------------- >> Instant Scalable Full-Text Search On Any Database/Application >> site: http://www.dbsight.net >> demo: http://search.dbsight.com >> Lucene Database Search in 3 minutes: >> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >> DBSight customer, a shopping comparison site, (anonymous per request) got >> 2.6 Million Euro funding! >> >> On Tue, Sep 9, 2008 at 10:43 PM, robert engels <[EMAIL PROTECTED]> >> wrote: >> You do not need a pool of IndexReaders... >> >> It does not matter what class it is, what matters is the class that >> ultimately holds the reference. >> >> If the IndexReader is never closed, the SegmentReader(s) is never closed, >> so the thread local in TermInfosReader is not cleared (because the thread >> never dies). So you will get one SegmentTermEnum, per thread * per segment. >> >> The SegmentTermEnum is not a large object, so even if you had 100 threads, >> and 100 segments, for 10k instances, seems hard to believe that is the >> source of your memory issue. >> >> The SegmentTermEnum is cached by thread since it needs to enumerate the >> terms, not having a per thread cache, would lead to lots of random access >> when multiple threads read the index - very slow. >> >> You need to keep in mind, what if every thread was executing a search >> simultaneously - you would still have 100x100 SegmentTermEnum instances >> anyway ! The only way to prevent that would be to create and destroy the >> SegmentTermEnum on each call (opening and seeking to the proper spot) - >> which would be SLOW SLOW SLOW. >> >> On Sep 10, 2008, at 12:19 AM, Chris Lu wrote: >> >> I have tried to create an IndexReader pool and dynamically create >> searcher. But the memory leak is the same. It's not related to the Searcher >> class specifically, but the SegmentTermEnum in TermInfosReader. >> >> -- >> Chris Lu >> ------------------------- >> Instant Scalable Full-Text Search On Any Database/Application >> site: http://www.dbsight.net >> demo: http://search.dbsight.com >> Lucene Database Search in 3 minutes: >> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >> DBSight customer, a shopping comparison site, (anonymous per request) got >> 2.6 Million Euro funding! >> >> On Tue, Sep 9, 2008 at 10:14 PM, robert engels <[EMAIL PROTECTED]> >> wrote: >> A searcher uses an IndexReader - the IndexReader is slow to open, not a >> Searcher. And searchers can share an IndexReader. >> >> You want to create a single shared (across all threads/users) IndexReader >> (usually), and create an Searcher as needed and dispose. It is VERY CHEAP >> to create the Searcher. >> >> I am fairly certain the javadoc on Searcher is incorrect. The warning >> "For performance reasons it is recommended to open only one IndexSearcher >> and use it for all of your searches" is not true in the case where an >> IndexReader is passed to the ctor. >> >> Any caching should USUALLY be performed at the IndexReader level. >> >> You are most likely using the "path" ctor, and that is the source of your >> problems, as multiple IndexReader instances are being created, and thus the >> memory use. >> >> >> On Sep 9, 2008, at 11:44 PM, Chris Lu wrote: >> >> On J2EE environment, usually there is a searcher pool with several >> searchers open. >> The speed to opening a large index for every user is not acceptable. >> >> -- >> Chris Lu >> ------------------------- >> Instant Scalable Full-Text Search On Any Database/Application >> site: http://www.dbsight.net >> demo: http://search.dbsight.com >> Lucene Database Search in 3 minutes: >> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >> DBSight customer, a shopping comparison site, (anonymous per request) got >> 2.6 Million Euro funding! >> >> On Tue, Sep 9, 2008 at 9:03 PM, robert engels <[EMAIL PROTECTED]> >> wrote: >> You need to close the searcher within the thread that is using it, in >> order to have it cleaned up quickly... usually right after you display the >> page of results. >> >> If you are keeping multiple searcher refs across multiple threads for >> paging/whatever, you have not coded it correctly. >> >> Imagine 10,000 users - storing a searcher for each one is not going to >> work... >> >> On Sep 9, 2008, at 10:21 PM, Chris Lu wrote: >> >> Right, in a sense I can not release it from another thread. But that's the >> problem. >> >> It's a J2EE environment, all threads are kind of equal. It's simply not >> possible to iterate through all threads to close the searcher, thus >> releasing the ThreadLocal cache. >> Unless Lucene is not recommended for J2EE environment, this has to be >> fixed. >> >> -- >> Chris Lu >> ------------------------- >> Instant Scalable Full-Text Search On Any Database/Application >> site: http://www.dbsight.net >> demo: http://search.dbsight.com >> Lucene Database Search in 3 minutes: >> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >> DBSight customer, a shopping comparison site, (anonymous per request) got >> 2.6 Million Euro funding! >> >> >> On Tue, Sep 9, 2008 at 8:14 PM, robert engels <[EMAIL PROTECTED]> >> wrote: >> Your code is not correct. You cannot release it on another thread - the >> first thread may creating hundreds/thousands of instances before the other >> thread ever runs... >> >> On Sep 9, 2008, at 10:10 PM, Chris Lu wrote: >> >> If I release it on the thread that's creating the searcher, by setting >> searcher=null, everything is fine, the memory is released very cleanly. >> My load test was to repeatedly create a searcher on a RAMDirectory and >> release it on another thread. The test will quickly go to OOM after several >> runs. I set the heap size to be 1024M, and the RAMDirectory is of size 250M. >> Using some profiling tool, the used size simply stepped up pretty obviously >> by 250M. >> >> I think we should not rely on something that's a "maybe" behavior, >> especially for a general purpose library. >> >> Since it's a multi-threaded env, the thread that's creating the entries in >> the LRU cache may not go away quickly(actually most, if not all, application >> servers will try to reuse threads), so the LRU cache, which uses thread as >> the key, can not be released, so the SegmentTermEnum which is in the same >> class can not be released. >> >> And yes, I close the RAMDirectory, and the fileMap is released. I verified >> that through the profiler by directly checking the values in the snapshot. >> >> Pretty sure the reference tree wasn't like this using code before this >> commit, because after close the searcher in another thread, the RAMDirectory >> totally disappeared from the memory snapshot. >> >> -- >> Chris Lu >> ------------------------- >> Instant Scalable Full-Text Search On Any Database/Application >> site: http://www.dbsight.net >> demo: http://search.dbsight.com >> Lucene Database Search in 3 minutes: >> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >> DBSight customer, a shopping comparison site, (anonymous per request) got >> 2.6 Million Euro funding! >> >> On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless < >> [EMAIL PROTECTED]> wrote: >> >> Chris Lu wrote: >> >> The problem should be similar to what's talked about on this discussion. >> http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal >> >> The "rough" conclusion of that thread is that, technically, this isn't a >> memory leak but rather a "delayed freeing" problem. Ie, it may take longer, >> possibly much longer, than you want for the memory to be freed. >> >> >> There is a memory leak for Lucene search from Lucene-1195.(svn r659602, >> May23,2008) >> >> This patch brings in a ThreadLocal cache to TermInfosReader. >> >> One thing that confuses me: TermInfosReader was already using a >> ThreadLocal to cache the SegmentTermEnum instance. What was added in this >> commit (for LUCENE-1195) was an LRU cache storing Term -> TermInfo >> instances. But it seems like it's the SegmentTermEnum instance that you're >> tracing below. >> >> >> It's usually recommended to keep the reader open, and reuse it when >> possible. In a common J2EE application, the http requests are usually >> handled by different threads. But since the cache is ThreadLocal, the >> cache >> are not really usable by other threads. What's worse, the cache can not be >> cleared by another thread! >> >> This leak is not so obvious usually. But my case is using RAMDirectory, >> having several hundred megabytes. So one un-released resource is obvious >> to >> me. >> >> Here is the reference tree: >> org.apache.lucene.store.RAMDirectory >> |- directory of org.apache.lucene.store.RAMFile >> |- file of org.apache.lucene.store.RAMInputStream >> |- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput >> |- input of org.apache.lucene.index.SegmentTermEnum >> |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry >> >> So you have a RAMDir that has several hundred MB stored in it, that you're >> done with yet through this path Lucene is keeping it alive? >> >> Did you close the RAMDir? (which will null its fileMap and should also >> free your memory). >> >> Also, that reference tree doesn't show the ThreadResources class that was >> added in that commit -- are you sure this reference tree wasn't before the >> commit? >> >> Mike >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >> >> -- >> Chris Lu >> ------------------------- >> Instant Scalable Full-Text Search On Any Database/Application >> site: http://www.dbsight.net >> demo: http://search.dbsight.com >> Lucene Database Search in 3 minutes: >> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >> DBSight customer, a shopping comparison site, (anonymous per request) got >> 2.6 Million Euro funding! >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >