Bugs item #1070309, was opened at 2004-11-21 05:38
Message generated for change (Comment added) made by jdvorak
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1070309&group_id=16035

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Guillaume Poirier (gpoirier)
Assigned to: David Lucas (ddlucas)
Summary: ThreadLocal cache

Initial Comment:
There's a problem with the use of ThreadLocals for
caching, it's never cleaned up.  If the ClassLoader is
thrown away (e.g. if a webapp is reloaded), then the
ThreadLocal stuff hold until the Thread dies, which
might take a while in a Servlet container where the
Threads are pooled.  In such situation, DOM4J would
prevent the webapp's ClassLoader to be garbage
collected, which would creates a a memory leak and
eventually an OOM if the webapp is reloaded too many times.

To fix the problem, the use of ThreadLocal could be
avoided completly by using thread-safe singleton
instead.  There's three places it's used.  There's the
getInstance() method from DOMDocumentFactory and
DocumentFactory.  In that situation, it doesn't seem an
appropriate use to me.  First, the DOM spec doesn't
mandate for that implementation to be thread safe, and
anyway returning a different instance per thread is no
help there, each instance can still be used in
different threads.  And you would usually expect a
getInstance() method to always return the same object.

Also, DOMDocumentFactory already seems perfectly thread
safe.

The last use of ThreadLocal I identified is in QName,
basically for ThreadLocal caching, which doesn't really
seem needed, QNameCache seems thread-safe as far as I
can see, unless the content of the HashMaps are not
meant to be shared by differents calls to
DocumentFactory?  But anyway, I don't really thing it
would too hard to get rid of the ThreadLocal there.

http://article.gmane.org/gmane.comp.java.springframework.devel/6216

----------------------------------------------------------------------

Comment By: Jan Dvorak (jdvorak)
Date: 2004-11-23 09:41

Message:
Logged In: YES 
user_id=22470

It looks like there are many suggestions what to do with
QName and Namespace caching: ranging from a singleton cache
to dropping caching altogether, with the current ThreadLocal
being a valid choice still. Different usage scenarios have
different needs. I'd suggest that we make the caching
strategy a configurable feature, somehow.

Perhaps an abstract CachingStrategy class that has methods
to produce a QNameCache and NamespaceCache, given a Document
and/or a DocumentFactory (some implementations will not use
the arguments). Having a static setter for the
CachingStrategy in QName and/or Namespace would allow use to
change the caching strategy setting programatically; the
default choice could be based on a system property.

----------------------------------------------------------------------

Comment By: Maarten Coene (maartenc)
Date: 2004-11-22 20:13

Message:
Logged In: YES 
user_id=178745

I think we should get rid of these (singleton) caches. I
believe it would be better to keep these caches at the
Document level: each Document can have it's own cache of
QName/Namespace objects.

This would avoid us to write any synchronization code as I
don't think dom4j should provide thread-safety at the
Document level: it's the responsability of the application
to synchronize the access to the *same* Document instance.
However, dom4j should be thread-safe accross multiple
Document instances.

I think this can also improve performance as we don't need
any synchronized methods anymore to manage the access to
these caches.

In addition, it could solve some of the problems we have now
with respect to the Namespace objects that are shared by
multiple document instances.

Maarten

----------------------------------------------------------------------

Comment By: Guillaume Poirier (gpoirier)
Date: 2004-11-22 04:43

Message:
Logged In: YES 
user_id=942521

Oh, and by the way, I checked out the source code from CVS,
I removed the ThreadLocals, and all the tests were still
sucesful, including one named ThreadingTest.

----------------------------------------------------------------------

Comment By: Guillaume Poirier (gpoirier)
Date: 2004-11-22 04:41

Message:
Logged In: YES 
user_id=942521

Well, I admit I might have missed something, but when I said
it would be thread safe, I was including concurrent
modification of state causing stale or corrupted data, and I
still don't see where it would happen.  

In QName, I don't see any state that can be modified once it
has been put in the HashMap.  In QNameCache, the only
potential negative effect of concurrent access that I can
see is if two requests are made for the same name/Namespace
at the same time, then one would get kicked out of the
HashMap.  But it doesn't really seem to matter, because I
can only see entries added in the cache, nothing is ever
removed.  Am I missing something, what shared state is modified?

----------------------------------------------------------------------

Comment By: David Lucas (ddlucas)
Date: 2004-11-22 01:56

Message:
Logged In: YES 
user_id=633013

One of the issues we ran into in the past was not
thread-safety but "thread hot" protection.  The problem is
when two seperate threads modify data and the state changes
during an operation.  It is like a race condition, but the
issues can be very dangerous between threads.  That was one
of the drivers to have a seperate cache per thread.  It was
not the best solution at the time, but was a solution that
resolved the issue without over complicating what already
existed.  I will look into cleaning up the cache and
removing ThreadLocal / Singleton caches where it makes sense.

Thanks for bringing this issue back into the light.  We are
hoping to address this before the next release.


----------------------------------------------------------------------

Comment By: Guillaume Poirier (gpoirier)
Date: 2004-11-21 17:20

Message:
Logged In: YES 
user_id=942521

The thing is, that code would be thread safe even without
the use of ThreadLocal.  It doesn't provice any benifit, the
only place that it might allow to avoid synchronization is
in QNameCache for the HashMap, but they ARE synchronized
anyway.  And in that case, if you were to worry about the
synchronized HashMap performance, then you can use Doug
Lea's ConcurrentHashMap.

----------------------------------------------------------------------

Comment By: David Lucas (ddlucas)
Date: 2004-11-21 09:32

Message:
Logged In: YES 
user_id=633013

This is a limitation of using ThreadLocal for a quick cache.
 However, DOM4J must maintain its thread safety.  There are
several users that have demonstrated that need in the past
and it would not be wise to remove the implied requirement
since it currently is thread-safe.  

The problem is a contextual one.  A more elaborate caching
algorithm will be required to avoid the potential memory
leak.  Something in the neighborhood of clearing the cache
of things not recently used or even clear it periodically
based on time since last cache cleared.

A ThreadLocal cache does provide some benefits by avoiding
synchronization between multiple threads.  The fix for this
bug should maintain equal performance or better while
maintaining thread safety.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1070309&group_id=16035


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

Reply via email to