Re: ThreadLocal in SegmentReader

robert engels Sat, 12 Jul 2008 07:59:22 -0700

You are mistaken - Yonik's comment in that thread is correct(although it is not just at table resize - any time a ThreadLocal isadded, and any time the ThreadLocal is not found in its direct hashit might clear entries).

The ThreadLocals map only has a WeakReference to the ThreadLocal, soif that is the only reference, it will be GC'd - eventually, and thenit will be cleared as new ThreadLocals are created.

With a static reference, the thread can reference the ThreadLocal atany time, and thus the WeakReference will not be cleared.

If the object is VERY large, and new ThreadLocals are not created itcould cause a problem, but I don't think that is the case withLucene, as the objects stored in ThreadLocals are designed to livefor the life of the SegmentReader/IndexReader and thread.



On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:

Well, possibly I'm mistaken, but it seems that this affects non-static fields

too. Please see

http://www.nabble.com/ThreadLocal-in-SegmentReader-to18306230.htmlwhere the

use case is described in the details.

In short: it seems that the scope of ThreadLocals does not matter.Whatreally matters is that they are referenced by ThreadLocals map inthe thread

which is still alive.


Robert Engels wrote:


This is only an issue for static ThreadLocals ...

On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote:


The problem here is not because ThreadLocal instances are not GC'd
(they are
GC'd, and your test shows this clearly).
But even one instance which is not removed from its Thread is
enough to
prevent the classloader from being unloaded, and that's the problem.


Michael McCandless-2 wrote:


OK, I created a simple test to test this (attached).  The test just
runs 10 threads, each one creating a 100 KB byte array which is
stored

into a ThreadLocal, and then periodically the ThreadLocal isreplaced

with a new one.  This is to test whether GC of a ThreadLocal, even

though the thread is still alive, in fact leads to GC of theobjects

held in the ThreadLocal.

Indeed on Sun JRE 1.4, 1.5 and 1.6 it appears that the objectsare in

fact properly collected.

So this is not a leak but rather a "delayed collection" issue.
Java's
GC is never guaranteed to be immediate, and apparently when using

ThreadLocals it's even less immediate than "normal". In theoriginalissue, if other things create ThreadLocals, then eventuallyLucene's

unreferenced ThreadLocals would be properly collected.

So I think we continue to use non-static ThreadLocals in Lucene...

Mike






robert engels wrote:

Once again, these are "static" thread locals. A completelydifferentissue. Since the object is available statically, the weakreferencecannot be cleared so stale entries will never be cleared aslong as

the thread is alive.

On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:

Just a few examples of "problems" using ThreadLocals.

http://opensource.atlassian.com/projects/hibernate/browse/HHH-2481

http://www.theserverside.com/news/thread.tss?thread_id=41473

Once again, I'm not pointing to Lucene SegmentReader as a "bad"
implementation, and maybe the current "problems" of ThreadLocals
are not a problem for SegmentReader but it seems safer to use

ThreadLocals to pass context information which is cleared whenthe

call exits instead of storing long-lived objects.


robert engels wrote:

Aside from the pre-1.5 thread local "perceived leak", thereare no

issues with ThreadLocals if used properly.

There is no need for try/finally blocks, unless you MUST release

resources immediately, usually this is not the case, which iswhy

a ThreadLocal is used in the first place.

From the ThreadLocalMap javadoc...

 /**
     * ThreadLocalMap is a customized hash map suitable only for

* maintaining thread local values. No operations areexported

     * outside of the ThreadLocal class. The class is package
private to

* allow declaration of fields in class Thread. To helpdeal

with
     * very large and long-lived usages, the hash table entries
use
     * WeakReferences for keys. However, since reference queues
are not

* used, stale entries are guaranteed to be removed onlywhen

     * the table starts running out of space.
     */

/**
         * Heuristically scan some cells looking for stale
entries.

* This is invoked when either a new element isadded, or

         * another stale one has been expunged. It performs a
         * logarithmic number of scans, as a balance between no
         * scanning (fast but retains garbage) and a number of
scans

* proportional to number of elements, that wouldfind all

         * garbage but would cause some insertions to take O(n)
time.
         *
         * @param i a position known NOT to hold a stale entry.
The
         * scan starts at the element after i.
         *
         * @param n scan control: <tt>log2(n)</tt> cells are
scanned,
         * unless a stale entry one is found, in which case
         * <tt>log2(table.length)-1</tt> additional cells are
scanned.
         * When called from insertions, this parameter is the
number

* of elements, but when from replaceStaleEntry, itis the

         * table length. (Note: all this could be changed to be
either

* more or less aggressive by weighting n instead ofjust* using straight log n. But this version is simple,fast,

and
         * seems to work well.)
         *
         * @return true if any stale entries have been removed.
         */


The instance ThreadLocals (and what the refer to) will be GC'd
when the containing Object is GC'd.

There IS NO MEMORY LEAK in ThreadLocal. If the ThreadLocalrefersto an object that has native resources (e.g. file handles),it may

not be released until other thread locals are created by the
thread (or the thread terminates).

You can avoid this "delay" by calling remove(), but in most

applications it should never be necessary - unless a verystrange

usage...

On Jul 9, 2008, at 2:37 PM, Adrian Tarau wrote:

From what I know, storing objects in ThreadLocal is safe aslong

as you release the object within a try {} finall {} block or
store objects which are independent of the rest of the code(no
dependencies).Otherwise it can get pretty tricky(memory leaks,
classloader problems) after awhile.

It is pretty convenient to pass HTTP request information with a
ThreadLocal in a servlet(but you should cleanup the variable
before leaving the servlet) but I'm not sure how safe it is in
this case.

robert engels wrote:

Using synchronization is a poor/invalid substitute for thread
locals in many cases.

The point of the thread local in these referenced cases is too

allow streaming reads on a file descriptor. if you use ashared

file descriptor/buffer you are going to continually invalidate
the buffer.

On Jul 8, 2008, at 5:12 AM, Michael McCandless wrote:


Well ... SegmentReader uses ThreadLocal to hold a thread-
private instance of TermVectorsReader, to avoid synchronizing
per-document when loading term vectors.

Clearing this ThreadLocal value per call to SegmentReader's
methods that load term vectors would defeat its purpose.

Though, of course, we then synchronize on the underlying file
(when using FSDirectory), so perhaps we are really not saving
much by using ThreadLocal here.  But we are looking to relax
that low level synchronization with LUCENE-753.

Maybe we could make our own ThreadLocal that just uses a
HashMap, which we'd have to synchronize on when getting the
per-
thread instances.  Or, go back to sharing a single
TermVectorsReader and synchronize per-document.

Jason has suggested moving to a model where you ask the

IndexReader for an object that can return term vectors /stored

fields / etc, and then you interact with that many times to
retrieve each doc.  We could then synchronize only on

retrieving that object, and provide a thread-privateinstance.


It seems like we should move away from using ThreadLocal in
Lucene and do "normal" synchronization instead.

Mike

Adrian Tarau wrote:

Usually ThreadLocal.remove() should be called at the end(in a

finally block), before the current call leaves your code.

Ex : if during searching ThreadLocal is used, every search(..)

method should cleanup any ThreadLocal variables, or even

deeper in the implementation. When the call leaves Luceneany

used ThreadLocal should be cleaned up.

Michael McCandless wrote:

ThreadLocal, which we use in several places in Lucene,causes

a leak in app servers because the classloader never fully
deallocates Lucene's classes because the ThreadLocal is
holding strong references.

Yet, ThreadLocal is very convenient for avoiding
synchronization.

Does anyone have any ideas on how to solve this w/o falling
back to "normal" synchronization?

Mike

Begin forwarded message:

From: "Yonik Seeley" <[EMAIL PROTECTED]>
Date: July 7, 2008 3:30:28 PM EDT
To: [EMAIL PROTECTED]
Subject: Re: ThreadLocal in SegmentReader
Reply-To: [EMAIL PROTECTED]

On Mon, Jul 7, 2008 at 2:43 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:

So now I'm confused: the SegmentReader itself should no
longer be reachable,
assuming you are not holding any references to your
IndexReader.

Which means the ThreadLocal instance should no longer be
reachable.


It will still be referenced from the Thread(s)
ThreadLocalMap

The key (the ThreadLocal) will be weakly referenced,but the

values
(now stale) are strongly referenced and won't be actually
removed

until the table is resized (under the Java6 impl atleast).

Nice huh?

-Yonik

------------------------------------------------------------

---------
To unsubscribe, e-mail: java-user-
[EMAIL PROTECTED]
For additional commands, e-mail: java-user-
[EMAIL PROTECTED]

-------------------------------------------------------------

--------
To unsubscribe, e-mail: java-dev-
[EMAIL PROTECTED]
For additional commands, e-mail: java-dev-
[EMAIL PROTECTED]

--------------------------------------------------------------

-------

To unsubscribe, e-mail: java-dev-[EMAIL PROTECTED]

For additional commands, e-mail: java-dev-
[EMAIL PROTECTED]

---------------------------------------------------------------

------

To unsubscribe, e-mail: java-dev-[EMAIL PROTECTED]

For additional commands, e-mail: java-dev-
[EMAIL PROTECTED]

----------------------------------------------------------------

-----
To unsubscribe, e-mail: [EMAIL PROTECTED]

For additional commands, e-mail: java-dev-[EMAIL PROTECTED]

-----------------------------------------------------------------

----
To unsubscribe, e-mail: [EMAIL PROTECTED]

For additional commands, e-mail: java-dev-[EMAIL PROTECTED]

---------------------------------------------------------------------

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--
View this message in context: http://www.nabble.com/Fwd%3A-
ThreadLocal-in-SegmentReader-tp18326720p18416022.html
Sent from the Lucene - Java Developer mailing list archive at
Nabble.com.

---------------------------------------------------------------------

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--

View this message in context: http://www.nabble.com/Fwd%3A-ThreadLocal-in-SegmentReader-tp18326720p18416841.htmlSent from the Lucene - Java Developer mailing list archive atNabble.com.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: ThreadLocal in SegmentReader

Reply via email to