Yeah, the timing is different. But it's an unknown, undetermined, and uncontrollable time... We can not ask the user,
while(memory is low){ sleep(1000); } do_the_real_thing_an_hour_later -- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Wed, Sep 10, 2008 at 10:39 AM, robert engels <[EMAIL PROTECTED]>wrote: > Close() does work - it is just that the memory may not be freed until much > later... > When working with VERY LARGE objects, this can be a problem. > > On Sep 10, 2008, at 12:36 PM, Chris Lu wrote: > > Thanks for the analysis, really appreciate it, and I agree with it. But... > This is really a normal J2EE use case. The threads seldom die. > Doesn't that mean closing the RAMDirectory doesn't work for J2EE > applications? > And only reopen() works? > And close() doesn't release the resources? duh... > > I can only say this is a problem to be cleaned up. > > -- > Chris Lu > ------------------------- > Instant Scalable Full-Text Search On Any Database/Application > site: http://www.dbsight.net > demo: http://search.dbsight.com > Lucene Database Search in 3 minutes: > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > DBSight customer, a shopping comparison site, (anonymous per request) got > 2.6 Million Euro funding! > > > On Wed, Sep 10, 2008 at 9:10 AM, robert engels <[EMAIL PROTECTED]>wrote: > >> You do not need to create a new RAMDirectory - just write to the existing >> one, and then reopen() the IndexReader using it. >> This will prevent lots of big objects being created. This may be the >> source of your problem. >> >> Even if the Segment is closed, the ThreadLocal will no longer be >> referenced, but there will still be a reference to the SegmentTermEnum >> (which will be cleared when the thread dies, or "most likely" when new >> thread locals on that thread a created, so here is a potential problem. >> >> Thread 1 does a search, creates a thread local that references the RAMDir >> (A). >> Thread 2 does a search, creates a thread local that references the RAMDir >> (A). >> >> All readers, are closed on RAMDir (A). >> >> A new RAMDir (B) is opened. >> >> There may still be references in the thread local maps to RAMDir A (since >> no new thread local have been created yet). >> >> So you may get OOM depending on the size of the RAMDir (since you would >> need room for more than 1). If you extend this out with lots of threads >> that don't run very often, you can see how you could easily run out of >> memory. "I think" that ThreadLocal should use a ReferenceQueue so stale >> object slots can be reclaimed as soon as the key is dereferenced - but that >> is an issue for SUN. >> >> This is why you don't want to create new RAMDirs. >> >> A good rule of thumb - don't keep references to large objects in >> ThreadLocal (especially indirectly). If needed, use a "key", and then read >> the cache using a the "key". >> This would be something for the Lucene folks to change. >> >> On Sep 10, 2008, at 10:44 AM, Chris Lu wrote: >> >> I am really want to find out where I am doing wrong, if that's the case. >> >> Yes. I have made certain that I closed all Readers/Searchers, and verified >> that through memory profiler. >> Yes. I am creating new RAMDirectory. But that's the problem. I need to >> update the content. Sure, if no content update and everything the same, of >> course no OOM. >> >> Yes. No guarantee of the thread schedule. But that's the problem. If >> Lucene is using ThreadLocal to cache lots of things by the Thread as the >> key, and no idea when it'll be released. Of course ThreadLocal is not >> Lucene's problem... >> >> Chris >> >> On Wed, Sep 10, 2008 at 8:34 AM, robert engels <[EMAIL PROTECTED]>wrote: >> >>> It is basic Java. Threads are not guaranteed to run on any sort of >>> schedule. If you create lots of large objects in one thread, releasing them >>> in another, there is a good chance you will get an OOM (since the releasing >>> thread may not run before the OOM occurs)... This is not Lucene specific by >>> any means. >>> It is a misunderstanding on your part about how GC works. >>> >>> I assume you must at some point be creating new RAMDirectories - >>> otherwise the memory would never really increase, since the >>> IndexReader/enums/etc are not very large... >>> >>> When you create a new RAMDirectories, you need to BE CERTAIN !!! that the >>> other IndexReaders/Searchers using the old RAMDirectory are ALL CLOSED, >>> otherwise their memory will still be in use, which leads to your OOM... >>> >>> >>> On Sep 10, 2008, at 10:16 AM, Chris Lu wrote: >>> >>> I do not believe I am making any mistake. Actually I just got an email >>> from another user, complaining about the same thing. And I am having the >>> same usage pattern. >>> After the reader is opened, the RAMDirectory is shared by several >>> objects. >>> There is one instance of RAMDirectory in the memory, and it is holding >>> lots of memory, which is expected. >>> >>> If I close the reader in the same thread that has opened it, the >>> RAMDirectory is gone from the memory. >>> If I close the reader in other threads, the RAMDirectory is left in the >>> memory, referenced along the tree I draw in the first email. >>> >>> I do not think the usage is wrong. Period. >>> >>> ------------------------------------- >>> >>> Hi, >>> >>> i found a forum post from you here [1] where you mention that you >>> have a memory leak using the lucene ram directory. I'd like to ask you >>> if you already have resolved the problem and how you did it or maybe >>> you know where i can read about the solution. We are using >>> RAMDirectory too and figured out, that over time the memory >>> consumption raises and raises until the system breaks down but only >>> when we performing much index updates. if we only create the index and >>> don't do nothing except searching it, it work fine. >>> >>> maybe you can give me a hint or a link, >>> greetz, >>> >>> ------------------------------------- >>> >>> -- >>> Chris Lu >>> ------------------------- >>> Instant Scalable Full-Text Search On Any Database/Application >>> site: http://www.dbsight.net >>> demo: http://search.dbsight.com >>> Lucene Database Search in 3 minutes: >>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >>> DBSight customer, a shopping comparison site, (anonymous per request) got >>> 2.6 Million Euro funding! >>> >>> On Wed, Sep 10, 2008 at 7:12 AM, robert engels <[EMAIL PROTECTED]>wrote: >>> >>>> Sorry, but I am fairly certain you are mistaken. >>>> If you only have a single IndexReader, the RAMDirectory will be shared >>>> in all cases. >>>> >>>> The only memory growth is any buffer space allocated by an IndexInput >>>> (used in many places and cached). >>>> >>>> Normally the IndexInput created by a RAMDirectory do not have any buffer >>>> allocated, since the underlying store is already in memory. >>>> >>>> You have some other problem in your code... >>>> >>>> On Sep 10, 2008, at 1:10 AM, Chris Lu wrote: >>>> >>>> Actually, even I only use one IndexReader, some resources are cached via >>>> the ThreadLocal cache, and can not be released unless all threads do the >>>> close action. >>>> >>>> SegmentTermEnum itself is small, but it holds RAMDirectory along the >>>> path, which is big. >>>> >>>> -- >>>> Chris Lu >>>> ------------------------- >>>> Instant Scalable Full-Text Search On Any Database/Application >>>> site: http://www.dbsight.net >>>> demo: http://search.dbsight.com >>>> Lucene Database Search in 3 minutes: >>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >>>> DBSight customer, a shopping comparison site, (anonymous per request) >>>> got 2.6 Million Euro funding! >>>> On Tue, Sep 9, 2008 at 10:43 PM, robert engels <[EMAIL PROTECTED]>wrote: >>>> >>>>> You do not need a pool of IndexReaders... >>>>> It does not matter what class it is, what matters is the class that >>>>> ultimately holds the reference. >>>>> >>>>> If the IndexReader is never closed, the SegmentReader(s) is never >>>>> closed, so the thread local in TermInfosReader is not cleared (because the >>>>> thread never dies). So you will get one SegmentTermEnum, per thread * per >>>>> segment. >>>>> >>>>> The SegmentTermEnum is not a large object, so even if you had 100 >>>>> threads, and 100 segments, for 10k instances, seems hard to believe that >>>>> is >>>>> the source of your memory issue. >>>>> >>>>> The SegmentTermEnum is cached by thread since it needs to enumerate the >>>>> terms, not having a per thread cache, would lead to lots of random access >>>>> when multiple threads read the index - very slow. >>>>> >>>>> You need to keep in mind, what if every thread was executing a search >>>>> simultaneously - you would still have 100x100 SegmentTermEnum instances >>>>> anyway ! The only way to prevent that would be to create and destroy the >>>>> SegmentTermEnum on each call (opening and seeking to the proper spot) - >>>>> which would be SLOW SLOW SLOW. >>>>> >>>>> On Sep 10, 2008, at 12:19 AM, Chris Lu wrote: >>>>> >>>>> I have tried to create an IndexReader pool and dynamically create >>>>> searcher. But the memory leak is the same. It's not related to the >>>>> Searcher >>>>> class specifically, but the SegmentTermEnum in TermInfosReader. >>>>> >>>>> -- >>>>> Chris Lu >>>>> ------------------------- >>>>> Instant Scalable Full-Text Search On Any Database/Application >>>>> site: http://www.dbsight.net >>>>> demo: http://search.dbsight.com >>>>> Lucene Database Search in 3 minutes: >>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >>>>> DBSight customer, a shopping comparison site, (anonymous per request) >>>>> got 2.6 Million Euro funding! >>>>> >>>>> On Tue, Sep 9, 2008 at 10:14 PM, robert engels <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> A searcher uses an IndexReader - the IndexReader is slow to open, not >>>>>> a Searcher. And searchers can share an IndexReader. >>>>>> You want to create a single shared (across all threads/users) >>>>>> IndexReader (usually), and create an Searcher as needed and dispose. It >>>>>> is >>>>>> VERY CHEAP to create the Searcher. >>>>>> >>>>>> I am fairly certain the javadoc on Searcher is incorrect. The warning >>>>>> "For performance reasons it is recommended to open only one >>>>>> IndexSearcher and use it for all of your searches" is not true in the >>>>>> case where an IndexReader is passed to the ctor. >>>>>> >>>>>> Any caching should USUALLY be performed at the IndexReader level. >>>>>> >>>>>> You are most likely using the "path" ctor, and that is the source of >>>>>> your problems, as multiple IndexReader instances are being created, and >>>>>> thus >>>>>> the memory use. >>>>>> >>>>>> >>>>>> On Sep 9, 2008, at 11:44 PM, Chris Lu wrote: >>>>>> >>>>>> On J2EE environment, usually there is a searcher pool with several >>>>>> searchers open. The speed to opening a large index for every user is >>>>>> not acceptable. >>>>>> >>>>>> -- >>>>>> Chris Lu >>>>>> ------------------------- >>>>>> Instant Scalable Full-Text Search On Any Database/Application >>>>>> site: http://www.dbsight.net >>>>>> demo: http://search.dbsight.com >>>>>> Lucene Database Search in 3 minutes: >>>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >>>>>> DBSight customer, a shopping comparison site, (anonymous per request) >>>>>> got 2.6 Million Euro funding! >>>>>> >>>>>> On Tue, Sep 9, 2008 at 9:03 PM, robert engels <[EMAIL PROTECTED]>wrote: >>>>>> >>>>>>> You need to close the searcher within the thread that is using it, in >>>>>>> order to have it cleaned up quickly... usually right after you display >>>>>>> the >>>>>>> page of results. >>>>>>> If you are keeping multiple searcher refs across multiple threads for >>>>>>> paging/whatever, you have not coded it correctly. >>>>>>> >>>>>>> Imagine 10,000 users - storing a searcher for each one is not going >>>>>>> to work... >>>>>>> >>>>>>> On Sep 9, 2008, at 10:21 PM, Chris Lu wrote: >>>>>>> >>>>>>> Right, in a sense I can not release it from another thread. But >>>>>>> that's the problem. >>>>>>> >>>>>>> It's a J2EE environment, all threads are kind of equal. It's simply >>>>>>> not possible to iterate through all threads to close the searcher, thus >>>>>>> releasing the ThreadLocal cache. >>>>>>> Unless Lucene is not recommended for J2EE environment, this has to be >>>>>>> fixed. >>>>>>> >>>>>>> -- >>>>>>> Chris Lu >>>>>>> ------------------------- >>>>>>> Instant Scalable Full-Text Search On Any Database/Application >>>>>>> site: http://www.dbsight.net >>>>>>> demo: http://search.dbsight.com >>>>>>> Lucene Database Search in 3 minutes: >>>>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >>>>>>> DBSight customer, a shopping comparison site, (anonymous per request) >>>>>>> got 2.6 Million Euro funding! >>>>>>> >>>>>>> On Tue, Sep 9, 2008 at 8:14 PM, robert engels <[EMAIL PROTECTED] >>>>>>> > wrote: >>>>>>> >>>>>>>> Your code is not correct. You cannot release it on another thread - >>>>>>>> the first thread may creating hundreds/thousands of instances before >>>>>>>> the >>>>>>>> other thread ever runs... >>>>>>>> >>>>>>>> On Sep 9, 2008, at 10:10 PM, Chris Lu wrote: >>>>>>>> >>>>>>>> If I release it on the thread that's creating the searcher, by >>>>>>>> setting searcher=null, everything is fine, the memory is released very >>>>>>>> cleanly. >>>>>>>> My load test was to repeatedly create a searcher on a RAMDirectory >>>>>>>> and release it on another thread. The test will quickly go to OOM after >>>>>>>> several runs. I set the heap size to be 1024M, and the RAMDirectory is >>>>>>>> of >>>>>>>> size 250M. Using some profiling tool, the used size simply stepped up >>>>>>>> pretty >>>>>>>> obviously by 250M. >>>>>>>> >>>>>>>> I think we should not rely on something that's a "maybe" behavior, >>>>>>>> especially for a general purpose library. >>>>>>>> >>>>>>>> Since it's a multi-threaded env, the thread that's creating the >>>>>>>> entries in the LRU cache may not go away quickly(actually most, if not >>>>>>>> all, >>>>>>>> application servers will try to reuse threads), so the LRU cache, >>>>>>>> which uses >>>>>>>> thread as the key, can not be released, so the SegmentTermEnum which >>>>>>>> is in >>>>>>>> the same class can not be released. >>>>>>>> >>>>>>>> And yes, I close the RAMDirectory, and the fileMap is released. I >>>>>>>> verified that through the profiler by directly checking the values in >>>>>>>> the >>>>>>>> snapshot. >>>>>>>> >>>>>>>> Pretty sure the reference tree wasn't like this using code before >>>>>>>> this commit, because after close the searcher in another thread, the >>>>>>>> RAMDirectory totally disappeared from the memory snapshot. >>>>>>>> >>>>>>>> -- >>>>>>>> Chris Lu >>>>>>>> ------------------------- >>>>>>>> Instant Scalable Full-Text Search On Any Database/Application >>>>>>>> site: http://www.dbsight.net >>>>>>>> demo: http://search.dbsight.com >>>>>>>> Lucene Database Search in 3 minutes: >>>>>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >>>>>>>> DBSight customer, a shopping comparison site, (anonymous per >>>>>>>> request) got 2.6 Million Euro funding! >>>>>>>> >>>>>>>> On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless < >>>>>>>> [EMAIL PROTECTED]> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Chris Lu wrote: >>>>>>>>> >>>>>>>>> The problem should be similar to what's talked about on this >>>>>>>>>> discussion. >>>>>>>>>> http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal >>>>>>>>>> >>>>>>>>> >>>>>>>>> The "rough" conclusion of that thread is that, technically, this >>>>>>>>> isn't a memory leak but rather a "delayed freeing" problem. Ie, it >>>>>>>>> may take >>>>>>>>> longer, possibly much longer, than you want for the memory to be >>>>>>>>> freed. >>>>>>>>> >>>>>>>>> There is a memory leak for Lucene search from Lucene-1195.(svn >>>>>>>>>> r659602, May23,2008) >>>>>>>>>> >>>>>>>>>> This patch brings in a ThreadLocal cache to TermInfosReader. >>>>>>>>>> >>>>>>>>> >>>>>>>>> One thing that confuses me: TermInfosReader was already using a >>>>>>>>> ThreadLocal to cache the SegmentTermEnum instance. What was added in >>>>>>>>> this >>>>>>>>> commit (for LUCENE-1195) was an LRU cache storing Term -> TermInfo >>>>>>>>> instances. But it seems like it's the SegmentTermEnum instance that >>>>>>>>> you're >>>>>>>>> tracing below. >>>>>>>>> >>>>>>>>> It's usually recommended to keep the reader open, and reuse it >>>>>>>>>> when >>>>>>>>>> possible. In a common J2EE application, the http requests are >>>>>>>>>> usually >>>>>>>>>> handled by different threads. But since the cache is ThreadLocal, >>>>>>>>>> the cache >>>>>>>>>> are not really usable by other threads. What's worse, the cache >>>>>>>>>> can not be >>>>>>>>>> cleared by another thread! >>>>>>>>>> >>>>>>>>>> This leak is not so obvious usually. But my case is using >>>>>>>>>> RAMDirectory, >>>>>>>>>> having several hundred megabytes. So one un-released resource is >>>>>>>>>> obvious to >>>>>>>>>> me. >>>>>>>>>> >>>>>>>>>> Here is the reference tree: >>>>>>>>>> org.apache.lucene.store.RAMDirectory >>>>>>>>>> |- directory of org.apache.lucene.store.RAMFile >>>>>>>>>> |- file of org.apache.lucene.store.RAMInputStream >>>>>>>>>> |- base of >>>>>>>>>> org.apache.lucene.index.CompoundFileReader$CSIndexInput >>>>>>>>>> |- input of org.apache.lucene.index.SegmentTermEnum >>>>>>>>>> |- value of >>>>>>>>>> java.lang.ThreadLocal$ThreadLocalMap$Entry >>>>>>>>>> >>>>>>>>> >>>>>>>>> So you have a RAMDir that has several hundred MB stored in it, that >>>>>>>>> you're done with yet through this path Lucene is keeping it alive? >>>>>>>>> >>>>>>>>> Did you close the RAMDir? (which will null its fileMap and should >>>>>>>>> also free your memory). >>>>>>>>> >>>>>>>>> Also, that reference tree doesn't show the ThreadResources class >>>>>>>>> that was added in that commit -- are you sure this reference tree >>>>>>>>> wasn't >>>>>>>>> before the commit? >>>>>>>>> >>>>>>>>> Mike >>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Chris Lu >>>>>>>> ------------------------- >>>>>>>> Instant Scalable Full-Text Search On Any Database/Application >>>>>>>> site: http://www.dbsight.net >>>>>>>> demo: http://search.dbsight.com >>>>>>>> Lucene Database Search in 3 minutes: >>>>>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >>>>>>>> DBSight customer, a shopping comparison site, (anonymous per >>>>>>>> request) got 2.6 Million Euro funding! >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >>> >> >> >> -- >> Chris Lu >> ------------------------- >> Instant Scalable Full-Text Search On Any Database/Application >> site: http://www.dbsight.net >> demo: http://search.dbsight.com >> Lucene Database Search in 3 minutes: >> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes >> DBSight customer, a shopping comparison site, (anonymous per request) got >> 2.6 Million Euro funding! >> >> >> > > > >