TCK, CSIndexInput is returned by SegmentReader.getFieldCacheKey()
If you think it's an issue, then it'd be good to open an issue and submit some code as a patch, maybe a test case showing the WHM isn't removing values like it's supposed to. Jason On Mon, Dec 7, 2009 at 10:45 PM, TCK <moonwatcher32...@gmail.com> wrote: > Thanks for the feedback guys. The evidence I have collected does point to an > issue either in the java WeakHashMap implementation or in Lucene's use of > it. In particular, I used reflection to replace the WeakHashMap instances > with my own dummy Map that does a no-op for the put operation, and although > this caused tons more garbage to be created for each search query the CMS > executions always collected that garbage and brought down the tenured space > usage to a small value. > > Before that, I tried calling clear() on the WeakHashMap instances and then > calling size() which calls expungeStaleEntries() but that didn't help. There > may be indeed be a case of double memory usage for a while (until enough CMS > executions have run) as Tom pointed out, but I don't fully understand what's > happenning yet. > > Also, bizzarely during my introspection of the readerCache WeakHashMap > instances I found an instance of > org.apache.lucene.index.CompoundFileReader$CSIndexInput as a key. Looking at > the code I don't see how anything other than an IndexReader could possibly > get there, and this class certainly doesn't extend IndexReader. Any ideas? > :-) Maybe I need to get more sleep. > > Btw, is searching with a sort by a text field not a common use-case of > lucene? I've been testing with only 1Gb indexes and I'm pretty sure there > are much larger indexes out there. > > Cheers, > TCK > > > > > > On Mon, Dec 7, 2009 at 7:57 PM, Tom Hill <solr-l...@worldware.com> wrote: > >> Hey, that's a nice little Class! I hadn't see it before. But it sounds like >> the asynchronous cleanup might deal with the problem I mentioned above (but >> I haven't looked at the code yet). >> >> It's an apache license - but you mentioned something about no third party >> libraries. Is that a policy for Lucene? >> >> Thanks, >> >> Tom >> >> >> On Mon, Dec 7, 2009 at 4:44 PM, Jason Rutherglen < >> jason.rutherg...@gmail.com >> > wrote: >> >> > I wonder if Google Collections (even though we don't use third party >> > libraries) concurrent map, which supports weak keys, handles the >> > removal of weakly referenced keys in a more elegant way than Java's >> > WeakHashMap? >> > >> > On Mon, Dec 7, 2009 at 4:38 PM, Tom Hill <solr-l...@worldware.com> >> wrote: >> > > Hi - >> > > >> > > If I understand correctly, WeakHashMap does not free the memory for the >> > > value (cached data) when the key is nulled, or even when the key is >> > garbage >> > > collected. >> > > >> > > It requires one more step: a method on WeakHashMap must be called to >> > allow >> > > it to release its hard reference to the cached data. It appears that >> most >> > > methods in WeakHashMap end up calling expungeStaleEntries, which will >> > clear >> > > the hard reference. But you have to call some method on the map, before >> > the >> > > memory is eligible for garbage collection. >> > > >> > > So it requires four stages to free the cached data. Null the key; A GC >> to >> > > release the weak reference to the key; A call to some method on the >> map; >> > > Then the next GC cycle should free the value. >> > > >> > > So it seems possible that you could end up with double memory usage for >> a >> > > time. If you don't have a GC between the time that you close the old >> > reader, >> > > and you start to load the field cache entry for the next reader, then >> the >> > > key may still be hanging around uncollected. >> > > >> > > At that point, it may run a GC when you allocate the new cache, but >> > that's >> > > only the first GC. It can't free the cached data until after the next >> > call >> > > to expungeStaleEntries, so for a while you have both caches around. >> > > >> > > This extra usage could cause things to move into tenured space. Could >> > this >> > > be causing your problem? >> > > >> > > Workaround would be to cause some method to be called on the >> WeakHashMap. >> > > You don't want to call get(), since that will try to populate the >> cache. >> > > Maybe if you tried putting a small value to the cache, and doing a GC, >> > and >> > > see if your memory drops then. >> > > >> > > >> > > Tom >> > > >> > > >> > > >> > > On Mon, Dec 7, 2009 at 1:48 PM, TCK <moonwatcher32...@gmail.com> >> wrote: >> > > >> > >> Thanks for the response. But I'm definitely calling close() on the old >> > >> reader and opening a new one (not using reopen). Also, to simplify the >> > >> analysis, I did my test with a single-threaded requester to eliminate >> > any >> > >> concurrency issues. >> > >> >> > >> I'm doing: >> > >> sSearcher.getIndexReader().close(); >> > >> sSearcher.close(); // this actually seems to be a no-op >> > >> IndexReader newIndexReader = IndexReader.open(newDirectory); >> > >> sSearcher = new IndexSearcher(newIndexReader); >> > >> >> > >> Btw, isn't it bad practice anyway to have an unbounded cache? Are >> there >> > any >> > >> plans to replace the HashMaps used for the innerCaches with an actual >> > >> size-bounded cache with some eviction policy (perhaps EhCache or >> > something) >> > >> ? >> > >> >> > >> Thanks again, >> > >> TCK >> > >> >> > >> >> > >> >> > >> >> > >> On Mon, Dec 7, 2009 at 4:37 PM, Erick Erickson < >> erickerick...@gmail.com >> > >> >wrote: >> > >> >> > >> > What this sounds like is that you're not really closing your >> > >> > readers even though you think you are. Sorting indeed uses up >> > >> > significant memory when it populates internal caches and keeps >> > >> > it around for later use (which is one of the reasons that warming >> > >> > queries matter). But if you really do close the reader, I'm pretty >> > >> > sure the memory should be GC-able. >> > >> > >> > >> > One thing that trips people up is IndexReader.reopen(). If it >> > >> > returns a reader different than the original, you *must* close the >> > >> > old one. If you don't, the old reader is still hanging around and >> > >> > memory won't be returne.... An example from the Javadocs... >> > >> > >> > >> > IndexReader reader = ... >> > >> > ... >> > >> > IndexReader new = r.reopen(); >> > >> > if (new != reader) { >> > >> > ... // reader was reopened >> > >> > reader.close(); >> > >> > } >> > >> > reader = new; >> > >> > ... >> > >> > >> > >> > >> > >> > If this is irrelevant, could you post your close/open >> > >> > >> > >> > code? >> > >> > >> > >> > HTH >> > >> > >> > >> > Erick >> > >> > >> > >> > >> > >> > On Mon, Dec 7, 2009 at 4:27 PM, TCK <moonwatcher32...@gmail.com> >> > wrote: >> > >> > >> > >> > > Hi, >> > >> > > I'm having heap memory issues when I do lucene queries involving >> > >> sorting >> > >> > by >> > >> > > a string field. Such queries seem to load a lot of data in to the >> > heap. >> > >> > > Moreover lucene seems to hold on to references to this data even >> > after >> > >> > the >> > >> > > index reader has been closed and a full GC has been run. Some of >> the >> > >> > > consequences of this are that in my generational heap >> configuration >> > a >> > >> lot >> > >> > > of >> > >> > > memory gets promoted to tenured space each time I close the old >> > index >> > >> > > reader >> > >> > > and after opening and querying using a new one, and the tenured >> > space >> > >> > > eventually gets fragmented causing a lot of promotion failures >> > >> resulting >> > >> > in >> > >> > > jvm hangs while the jvm does stop-the-world GCs. >> > >> > > >> > >> > > Does anyone know any workarounds to avoid these memory issues when >> > >> doing >> > >> > > such lucene queries? >> > >> > > >> > >> > > My profiling showed that even after a full GC lucene is holding on >> > to a >> > >> > lot >> > >> > > of references to field value data notably via the >> > >> > > FieldCacheImpl/ExtendedFieldCacheImpl. I noticed that the >> > WeakHashMap >> > >> > > readerCaches are using unbounded HashMaps as the innerCaches and I >> > used >> > >> > > reflection to replace these innerCaches with dummy empty HashMaps, >> > but >> > >> > > still >> > >> > > I'm seeing the same behavior. I wondered if anyone has gone >> through >> > >> these >> > >> > > same issues before and would offer any advice. >> > >> > > >> > >> > > Thanks a lot, >> > >> > > TCK >> > >> > > >> > >> > >> > >> >> > > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org