[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

Andy Hind (JIRA) Thu, 04 May 2006 03:43:25 -0700

    [ 
http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377764 ]


Andy Hind commented on LUCENE-436:
----------------------------------

I agree this is not strictly an issue with lucene....but .....

Lucene has an unusual use pattern for thread locals (instance vs static member 
variables).
There are issues with ThreadLocal, discussed here and elsewhere, including 
potential instability.
You should expect others to use thread locals - maybe in the same way.

I fixed the issue reported in LUCENE-529 with memory sensitive caching using 
SoftReferences to the values held as ThreadLocals.
Before an out of memory error, the values are cleared and hard refs to the soft 
reference wrapper class remain.
This pattern is used by some classes in the JVM.
This limits the memory overhead with thread local use. 
There will always be some overhead.

I am happy with the alternative fix using WeakHashMap
Note: stale entries are always scanned and removed (each call to put, get, 
size) in contrast to thread locals. 
This is what I want - it must be stable!
 
I spent as much time as I could trying to come up with a clear, simple test 
case that applied regardless of memory constraints.
A clear failure case must be possible, but I did not have time to investigate 
the criteria for ThreadLocal instability.
In any case, I would send this to Sun.

A test case is one thing, knowning, understanding and fixing/working around an 
issue is another.
In all the simple cases I tried I got stability but with higher memory use and 
gc activity than with the "fixed" version.
However, I did also remove the pointless finalize() method, which could very 
well explain the growth of the thread local table.

We have had problems with 1.5.0_06. The issue is caused by the pattern of 
thread local use and garbage collection producing instability in the size of 
the thread local table. Your single test case does not imply that the issue 
does not exist for other JVMs and use cases.  I have had the issue without 
using RAMDirectory - it seems it is just more likely with it.

By the way, cause and effect is sufficient for me. I have a problem with using 
the 1.4.3 code, this change fixes it.

Regards

Andy

> [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception
> ----------------------------------------------------------------
>
>          Key: LUCENE-436
>          URL: http://issues.apache.org/jira/browse/LUCENE-436
>      Project: Lucene - Java
>         Type: Improvement

>   Components: Index
>     Versions: 1.4
>  Environment: Solaris JVM 1.4.1
> Linux JVM 1.4.2/1.5.0
> Windows not tested
>     Reporter: kieran
>  Attachments: FixedThreadLocal.java, Lucene-436-TestCase.tar.gz, 
> ThreadLocalTest.java
>
> We've been experiencing terrible memory problems on our production search 
> server, running lucene (1.4.3).
> Our live app regularly opens new indexes and, in doing so, releases old 
> IndexReaders for garbage collection.
> But...there appears to be a memory leak in 
> org.apache.lucene.index.TermInfosReader.java.
> Under certain conditions (possibly related to JVM version, although I've 
> personally observed it under both linux JVM 1.4.2_06, and 1.5.0_03, and SUNOS 
> JVM 1.4.1) the ThreadLocal member variable, "enumerators" doesn't get 
> garbage-collected when the TermInfosReader object is gc-ed.
> Looking at the code in TermInfosReader.java, there's no reason why it 
> _shouldn't_ be gc-ed, so I can only presume (and I've seen this suggested 
> elsewhere) that there could be a bug in the garbage collector of some JVMs.
> I've seen this problem briefly discussed; in particular at the following URL:
>   http://java2.5341.com/msg/85821.html
> The patch that Doug recommended, which is included in lucene-1.4.3 doesn't 
> work in our particular circumstances. Doug's patch only clears the 
> ThreadLocal variable for the thread running the finalizer (my knowledge of 
> java breaks down here - I'm not sure which thread actually runs the 
> finalizer). In our situation, the TermInfosReader is (potentially) used by 
> more than one thread, meaning that Doug's patch _doesn't_ allow the affected 
> JVMs to correctly collect garbage.
> So...I've devised a simple patch which, from my observations on linux JVMs 
> 1.4.2_06, and 1.5.0_03, fixes this problem.
> Kieran
> PS Thanks to daniel naber for pointing me to jira/lucene
> @@ -19,6 +19,7 @@
>  import java.io.IOException;
>  import org.apache.lucene.store.Directory;
> +import java.util.Hashtable;
>  /** This stores a monotonically increasing set of <Term, TermInfo> pairs in a
>   * Directory.  Pairs are accessed either by Term or by ordinal position the
> @@ -29,7 +30,7 @@
>    private String segment;
>    private FieldInfos fieldInfos;
> -  private ThreadLocal enumerators = new ThreadLocal();
> +  private final Hashtable enumeratorsByThread = new Hashtable();
>    private SegmentTermEnum origEnum;
>    private long size;
> @@ -60,10 +61,10 @@
>    }
>    private SegmentTermEnum getEnum() {
> -    SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get();
> +    SegmentTermEnum termEnum = 
> (SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread());
>      if (termEnum == null) {
>        termEnum = terms();
> -      enumerators.set(termEnum);
> +      enumeratorsByThread.put(Thread.currentThread(), termEnum);
>      }
>      return termEnum;
>    }
> @@ -195,5 +196,15 @@
>    public SegmentTermEnum terms(Term term) throws IOException {
>      get(term);
>      return (SegmentTermEnum)getEnum().clone();
> +  }
> +
> +  /* some jvms might have trouble gc-ing enumeratorsByThread */
> +  protected void finalize() throws Throwable {
> +    try {
> +        // make sure gc can clear up.
> +        enumeratorsByThread.clear();
> +    } finally {
> +        super.finalize();
> +    }
>    }
>  }
> TermInfosReader.java, full source:
> ======================================
> package org.apache.lucene.index;
> /**
>  * Copyright 2004 The Apache Software Foundation
>  *
>  * Licensed under the Apache License, Version 2.0 (the "License");
>  * you may not use this file except in compliance with the License.
>  * You may obtain a copy of the License at
>  *
>  *     http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> import java.io.IOException;
> import org.apache.lucene.store.Directory;
> import java.util.Hashtable;
> /** This stores a monotonically increasing set of <Term, TermInfo> pairs in a
>  * Directory.  Pairs are accessed either by Term or by ordinal position the
>  * set.  */
> final class TermInfosReader {
>   private Directory directory;
>   private String segment;
>   private FieldInfos fieldInfos;
>   private final Hashtable enumeratorsByThread = new Hashtable();
>   private SegmentTermEnum origEnum;
>   private long size;
>   TermInfosReader(Directory dir, String seg, FieldInfos fis)
>        throws IOException {
>     directory = dir;
>     segment = seg;
>     fieldInfos = fis;
>     origEnum = new SegmentTermEnum(directory.openFile(segment + ".tis"),
>                                    fieldInfos, false);
>     size = origEnum.size;
>     readIndex();
>   }
>   public int getSkipInterval() {
>     return origEnum.skipInterval;
>   }
>   final void close() throws IOException {
>     if (origEnum != null)
>       origEnum.close();
>   }
>   /** Returns the number of term/value pairs in the set. */
>   final long size() {
>     return size;
>   }
>   private SegmentTermEnum getEnum() {
>     SegmentTermEnum termEnum = 
> (SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread());
>     if (termEnum == null) {
>       termEnum = terms();
>       enumeratorsByThread.put(Thread.currentThread(), termEnum);
>     }
>     return termEnum;
>   }
>   Term[] indexTerms = null;
>   TermInfo[] indexInfos;
>   long[] indexPointers;
>   private final void readIndex() throws IOException {
>     SegmentTermEnum indexEnum =
>       new SegmentTermEnum(directory.openFile(segment + ".tii"),
>                         fieldInfos, true);
>     try {
>       int indexSize = (int)indexEnum.size;
>       indexTerms = new Term[indexSize];
>       indexInfos = new TermInfo[indexSize];
>       indexPointers = new long[indexSize];
>       for (int i = 0; indexEnum.next(); i++) {
>       indexTerms[i] = indexEnum.term();
>       indexInfos[i] = indexEnum.termInfo();
>       indexPointers[i] = indexEnum.indexPointer;
>       }
>     } finally {
>       indexEnum.close();
>     }
>   }
>   /** Returns the offset of the greatest index entry which is less than or 
> equal to term.*/
>   private final int getIndexOffset(Term term) throws IOException {
>     int lo = 0;                                         // binary search 
> indexTerms[]
>     int hi = indexTerms.length - 1;
>     while (hi >= lo) {
>       int mid = (lo + hi) >> 1;
>       int delta = term.compareTo(indexTerms[mid]);
>       if (delta < 0)
>       hi = mid - 1;
>       else if (delta > 0)
>       lo = mid + 1;
>       else
>       return mid;
>     }
>     return hi;
>   }
>   private final void seekEnum(int indexOffset) throws IOException {
>     getEnum().seek(indexPointers[indexOffset],
>             (indexOffset * getEnum().indexInterval) - 1,
>             indexTerms[indexOffset], indexInfos[indexOffset]);
>   }
>   /** Returns the TermInfo for a Term in the set, or null. */
>   TermInfo get(Term term) throws IOException {
>     if (size == 0) return null;
>     // optimize sequential access: first try scanning cached enum w/o seeking
>     SegmentTermEnum enumerator = getEnum();
>     if (enumerator.term() != null                 // term is at or past 
> current
>       && ((enumerator.prev != null && term.compareTo(enumerator.prev) > 0)
>           || term.compareTo(enumerator.term()) >= 0)) {
>       int enumOffset = (int)(enumerator.position/enumerator.indexInterval)+1;
>       if (indexTerms.length == enumOffset       // but before end of block
>         || term.compareTo(indexTerms[enumOffset]) < 0)
>       return scanEnum(term);                    // no need to seek
>     }
>     // random-access: must seek
>     seekEnum(getIndexOffset(term));
>     return scanEnum(term);
>   }
>   /** Scans within block for matching term. */
>   private final TermInfo scanEnum(Term term) throws IOException {
>     SegmentTermEnum enumerator = getEnum();
>     while (term.compareTo(enumerator.term()) > 0 && enumerator.next()) {}
>     if (enumerator.term() != null && term.compareTo(enumerator.term()) == 0)
>       return enumerator.termInfo();
>     else
>       return null;
>   }
>   /** Returns the nth term in the set. */
>   final Term get(int position) throws IOException {
>     if (size == 0) return null;
>     SegmentTermEnum enumerator = getEnum();
>     if (enumerator != null && enumerator.term() != null &&
>         position >= enumerator.position &&
>       position < (enumerator.position + enumerator.indexInterval))
>       return scanEnum(position);                // can avoid seek
>     seekEnum(position / enumerator.indexInterval); // must seek
>     return scanEnum(position);
>   }
>   private final Term scanEnum(int position) throws IOException {
>     SegmentTermEnum enumerator = getEnum();
>     while(enumerator.position < position)
>       if (!enumerator.next())
>       return null;
>     return enumerator.term();
>   }
>   /** Returns the position of a Term in the set or -1. */
>   final long getPosition(Term term) throws IOException {
>     if (size == 0) return -1;
>     int indexOffset = getIndexOffset(term);
>     seekEnum(indexOffset);
>     SegmentTermEnum enumerator = getEnum();
>     while(term.compareTo(enumerator.term()) > 0 && enumerator.next()) {}
>     if (term.compareTo(enumerator.term()) == 0)
>       return enumerator.position;
>     else
>       return -1;
>   }
>   /** Returns an enumeration of all the Terms and TermInfos in the set. */
>   public SegmentTermEnum terms() {
>     return (SegmentTermEnum)origEnum.clone();
>   }
>   /** Returns an enumeration of terms starting at or after the named term. */
>   public SegmentTermEnum terms(Term term) throws IOException {
>     get(term);
>     return (SegmentTermEnum)getEnum().clone();
>   }
>   /* some jvms might have trouble gc-ing enumeratorsByThread */ 
>   protected void finalize() throws Throwable {
>     try {
>         // make sure gc can clear up.
>         enumeratorsByThread.clear();
>     } finally {
>         super.finalize();
>     }
>   }
> }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

Reply via email to