[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

Nicholaus Shupe (JIRA) Wed, 03 May 2006 13:33:39 -0700

    [ 
http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377623 ]


Nicholaus Shupe commented on LUCENE-436:
----------------------------------------

robert engels is *incorrect* in that there isn't a memory leak with Lucene.  
The test case made by kieran demonstrates the memory leak quite well on my 
machine for the following setup:

Lucene 1.9.1
Sun JDK 1.5.0_06
JVM Args: -Xmx4M (low memory to get the unit test to fail faster)
Windows XP Pro

The test case will fail on my machine with i = 2006 and useRamDirectory set to 
true (as indicated by kieran, this point is important).

Furthermore, by changing the enumerators declaration to:

  private Map enumerators = Collections.synchronizedMap(new WeakHashMap());

and the getEnum method to:

  private SegmentTermEnum getEnum() {
    Thread currentThread = Thread.currentThread();
    SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get(currentThread);
    if (termEnum == null) {
      termEnum = terms();
      enumerators.put(currentThread, termEnum);
    }
    return termEnum;
  }

(note that I'm not suggesting that my changes be used for the official patch)

The problem dissapears, and the unit test completes.  I vote that this bug be 
bumped to critical in priority and a fix be included for the Lucene 1.9.2 
release and perhaps a patch be made for the 1.4.3 release for those who need it.

For further reading on the perils of ThreadLocal I suggest the following 
reading:

http://www.me.umn.edu/~shivane/blogs/cafefeed/2004/06/of-non-static-threadlocals-and-memory.html
http://www.szegedi.org/articles/memleak.html
http://crazybob.org/2006/02/threadlocal-memory-leak.html
http://opensource.atlassian.com/confluence/spring/pages/viewpage.action?pageId=2669

> [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception
> ----------------------------------------------------------------
>
>          Key: LUCENE-436
>          URL: http://issues.apache.org/jira/browse/LUCENE-436
>      Project: Lucene - Java
>         Type: Improvement

>   Components: Index
>     Versions: 1.4
>  Environment: Solaris JVM 1.4.1
> Linux JVM 1.4.2/1.5.0
> Windows not tested
>     Reporter: kieran
>  Attachments: Lucene-436-TestCase.tar.gz
>
> We've been experiencing terrible memory problems on our production search 
> server, running lucene (1.4.3).
> Our live app regularly opens new indexes and, in doing so, releases old 
> IndexReaders for garbage collection.
> But...there appears to be a memory leak in 
> org.apache.lucene.index.TermInfosReader.java.
> Under certain conditions (possibly related to JVM version, although I've 
> personally observed it under both linux JVM 1.4.2_06, and 1.5.0_03, and SUNOS 
> JVM 1.4.1) the ThreadLocal member variable, "enumerators" doesn't get 
> garbage-collected when the TermInfosReader object is gc-ed.
> Looking at the code in TermInfosReader.java, there's no reason why it 
> _shouldn't_ be gc-ed, so I can only presume (and I've seen this suggested 
> elsewhere) that there could be a bug in the garbage collector of some JVMs.
> I've seen this problem briefly discussed; in particular at the following URL:
>   http://java2.5341.com/msg/85821.html
> The patch that Doug recommended, which is included in lucene-1.4.3 doesn't 
> work in our particular circumstances. Doug's patch only clears the 
> ThreadLocal variable for the thread running the finalizer (my knowledge of 
> java breaks down here - I'm not sure which thread actually runs the 
> finalizer). In our situation, the TermInfosReader is (potentially) used by 
> more than one thread, meaning that Doug's patch _doesn't_ allow the affected 
> JVMs to correctly collect garbage.
> So...I've devised a simple patch which, from my observations on linux JVMs 
> 1.4.2_06, and 1.5.0_03, fixes this problem.
> Kieran
> PS Thanks to daniel naber for pointing me to jira/lucene
> @@ -19,6 +19,7 @@
>  import java.io.IOException;
>  import org.apache.lucene.store.Directory;
> +import java.util.Hashtable;
>  /** This stores a monotonically increasing set of <Term, TermInfo> pairs in a
>   * Directory.  Pairs are accessed either by Term or by ordinal position the
> @@ -29,7 +30,7 @@
>    private String segment;
>    private FieldInfos fieldInfos;
> -  private ThreadLocal enumerators = new ThreadLocal();
> +  private final Hashtable enumeratorsByThread = new Hashtable();
>    private SegmentTermEnum origEnum;
>    private long size;
> @@ -60,10 +61,10 @@
>    }
>    private SegmentTermEnum getEnum() {
> -    SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get();
> +    SegmentTermEnum termEnum = 
> (SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread());
>      if (termEnum == null) {
>        termEnum = terms();
> -      enumerators.set(termEnum);
> +      enumeratorsByThread.put(Thread.currentThread(), termEnum);
>      }
>      return termEnum;
>    }
> @@ -195,5 +196,15 @@
>    public SegmentTermEnum terms(Term term) throws IOException {
>      get(term);
>      return (SegmentTermEnum)getEnum().clone();
> +  }
> +
> +  /* some jvms might have trouble gc-ing enumeratorsByThread */
> +  protected void finalize() throws Throwable {
> +    try {
> +        // make sure gc can clear up.
> +        enumeratorsByThread.clear();
> +    } finally {
> +        super.finalize();
> +    }
>    }
>  }
> TermInfosReader.java, full source:
> ======================================
> package org.apache.lucene.index;
> /**
>  * Copyright 2004 The Apache Software Foundation
>  *
>  * Licensed under the Apache License, Version 2.0 (the "License");
>  * you may not use this file except in compliance with the License.
>  * You may obtain a copy of the License at
>  *
>  *     http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> import java.io.IOException;
> import org.apache.lucene.store.Directory;
> import java.util.Hashtable;
> /** This stores a monotonically increasing set of <Term, TermInfo> pairs in a
>  * Directory.  Pairs are accessed either by Term or by ordinal position the
>  * set.  */
> final class TermInfosReader {
>   private Directory directory;
>   private String segment;
>   private FieldInfos fieldInfos;
>   private final Hashtable enumeratorsByThread = new Hashtable();
>   private SegmentTermEnum origEnum;
>   private long size;
>   TermInfosReader(Directory dir, String seg, FieldInfos fis)
>        throws IOException {
>     directory = dir;
>     segment = seg;
>     fieldInfos = fis;
>     origEnum = new SegmentTermEnum(directory.openFile(segment + ".tis"),
>                                    fieldInfos, false);
>     size = origEnum.size;
>     readIndex();
>   }
>   public int getSkipInterval() {
>     return origEnum.skipInterval;
>   }
>   final void close() throws IOException {
>     if (origEnum != null)
>       origEnum.close();
>   }
>   /** Returns the number of term/value pairs in the set. */
>   final long size() {
>     return size;
>   }
>   private SegmentTermEnum getEnum() {
>     SegmentTermEnum termEnum = 
> (SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread());
>     if (termEnum == null) {
>       termEnum = terms();
>       enumeratorsByThread.put(Thread.currentThread(), termEnum);
>     }
>     return termEnum;
>   }
>   Term[] indexTerms = null;
>   TermInfo[] indexInfos;
>   long[] indexPointers;
>   private final void readIndex() throws IOException {
>     SegmentTermEnum indexEnum =
>       new SegmentTermEnum(directory.openFile(segment + ".tii"),
>                         fieldInfos, true);
>     try {
>       int indexSize = (int)indexEnum.size;
>       indexTerms = new Term[indexSize];
>       indexInfos = new TermInfo[indexSize];
>       indexPointers = new long[indexSize];
>       for (int i = 0; indexEnum.next(); i++) {
>       indexTerms[i] = indexEnum.term();
>       indexInfos[i] = indexEnum.termInfo();
>       indexPointers[i] = indexEnum.indexPointer;
>       }
>     } finally {
>       indexEnum.close();
>     }
>   }
>   /** Returns the offset of the greatest index entry which is less than or 
> equal to term.*/
>   private final int getIndexOffset(Term term) throws IOException {
>     int lo = 0;                                         // binary search 
> indexTerms[]
>     int hi = indexTerms.length - 1;
>     while (hi >= lo) {
>       int mid = (lo + hi) >> 1;
>       int delta = term.compareTo(indexTerms[mid]);
>       if (delta < 0)
>       hi = mid - 1;
>       else if (delta > 0)
>       lo = mid + 1;
>       else
>       return mid;
>     }
>     return hi;
>   }
>   private final void seekEnum(int indexOffset) throws IOException {
>     getEnum().seek(indexPointers[indexOffset],
>             (indexOffset * getEnum().indexInterval) - 1,
>             indexTerms[indexOffset], indexInfos[indexOffset]);
>   }
>   /** Returns the TermInfo for a Term in the set, or null. */
>   TermInfo get(Term term) throws IOException {
>     if (size == 0) return null;
>     // optimize sequential access: first try scanning cached enum w/o seeking
>     SegmentTermEnum enumerator = getEnum();
>     if (enumerator.term() != null                 // term is at or past 
> current
>       && ((enumerator.prev != null && term.compareTo(enumerator.prev) > 0)
>           || term.compareTo(enumerator.term()) >= 0)) {
>       int enumOffset = (int)(enumerator.position/enumerator.indexInterval)+1;
>       if (indexTerms.length == enumOffset       // but before end of block
>         || term.compareTo(indexTerms[enumOffset]) < 0)
>       return scanEnum(term);                    // no need to seek
>     }
>     // random-access: must seek
>     seekEnum(getIndexOffset(term));
>     return scanEnum(term);
>   }
>   /** Scans within block for matching term. */
>   private final TermInfo scanEnum(Term term) throws IOException {
>     SegmentTermEnum enumerator = getEnum();
>     while (term.compareTo(enumerator.term()) > 0 && enumerator.next()) {}
>     if (enumerator.term() != null && term.compareTo(enumerator.term()) == 0)
>       return enumerator.termInfo();
>     else
>       return null;
>   }
>   /** Returns the nth term in the set. */
>   final Term get(int position) throws IOException {
>     if (size == 0) return null;
>     SegmentTermEnum enumerator = getEnum();
>     if (enumerator != null && enumerator.term() != null &&
>         position >= enumerator.position &&
>       position < (enumerator.position + enumerator.indexInterval))
>       return scanEnum(position);                // can avoid seek
>     seekEnum(position / enumerator.indexInterval); // must seek
>     return scanEnum(position);
>   }
>   private final Term scanEnum(int position) throws IOException {
>     SegmentTermEnum enumerator = getEnum();
>     while(enumerator.position < position)
>       if (!enumerator.next())
>       return null;
>     return enumerator.term();
>   }
>   /** Returns the position of a Term in the set or -1. */
>   final long getPosition(Term term) throws IOException {
>     if (size == 0) return -1;
>     int indexOffset = getIndexOffset(term);
>     seekEnum(indexOffset);
>     SegmentTermEnum enumerator = getEnum();
>     while(term.compareTo(enumerator.term()) > 0 && enumerator.next()) {}
>     if (term.compareTo(enumerator.term()) == 0)
>       return enumerator.position;
>     else
>       return -1;
>   }
>   /** Returns an enumeration of all the Terms and TermInfos in the set. */
>   public SegmentTermEnum terms() {
>     return (SegmentTermEnum)origEnum.clone();
>   }
>   /** Returns an enumeration of terms starting at or after the named term. */
>   public SegmentTermEnum terms(Term term) throws IOException {
>     get(term);
>     return (SegmentTermEnum)getEnum().clone();
>   }
>   /* some jvms might have trouble gc-ing enumeratorsByThread */ 
>   protected void finalize() throws Throwable {
>     try {
>         // make sure gc can clear up.
>         enumeratorsByThread.clear();
>     } finally {
>         super.finalize();
>     }
>   }
> }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception

Reply via email to