Chris, if possible, could you try out this patch to see if it fixes the leak you're seeing? Thanks!

Mike

Michael McCandless (JIRA) wrote:


[ https://issues.apache.org/jira/browse/LUCENE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1383:
---------------------------------------

   Attachment: LUCENE-1383.patch

Attached patch.  All tests pass.

The patch adds o.a.l.util.CloseableThreadLocal. It's a wrapper around ThreadLocal that wraps the values inside a WeakReference, but then also holds a strong reference to the value (to ensure GC doesn't reclaim it) until you call the close method. On calling close, GC is then free to reclaim all values you had stored, regardless of how long it takes ThreadLocal's implementation to actually release its references.

There are a couple places in Lucene where I left the current usage of ThreadLocal.

First, Analyzer.java uses ThreadLocal to hold reusable token streams. There is no "close" called for Analyzer, so unless we are willing to add a finalizer to call CloseableThreadLocal.close() I think we can leave it.

Second, some of the contrib/benchmark tasks use ThreadLocal to store per-thread DateFormat which should use tiny memory.

Workaround ThreadLocal's "leak"
-------------------------------

               Key: LUCENE-1383
               URL: https://issues.apache.org/jira/browse/LUCENE-1383
           Project: Lucene - Java
        Issue Type: Bug
        Components: Index
  Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2
          Reporter: Michael McCandless
          Assignee: Michael McCandless
           Fix For: 2.4

       Attachments: LUCENE-1383.patch


Java's ThreadLocal is dangerous to use because it is able to take a
surprisingly very long time to release references to the values you
store in it.  Even when a ThreadLocal instance itself is GC'd, hard
references to the values you had stored in it are easily kept for
quite some time later.
While this is not technically a "memory leak", because eventually
(when the underlying Map that stores the values cleans up its "stale"
references) the hard reference will be cleared, and GC can proceed,
its end behavior is not different from a memory leak in that under the
right situation you can easily tie up far more memory than you'd
expect, and then hit unexpected OOM error despite allocating an
extremely large heap to your JVM.
Lucene users have hit this many times. Here's the most recent thread:
 
http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200809.mbox/%3C6e3ae6310809091157j7a9fe46bxcc31f6e63305fcdc%40mail.gmail.com%3E
And here's another:
 
http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200807.mbox/%3CF5FC94B2-E5C7-40C0-8B73-E12245B91CEE%40mikemccandless.com%3E
And then there's LUCENE-436 and LUCENE-529 at least.
A google search for "ThreadLocal leak" yields many compelling hits.
Sun does this for performance reasons, but I think it's a terrible
trap and we should work around it with Lucene.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to