Bugs item #1070309, was opened at 2004-11-21 05:38 Message generated for change (Comment added) made by jdvorak You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1070309&group_id=16035
Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Guillaume Poirier (gpoirier) Assigned to: David Lucas (ddlucas) Summary: ThreadLocal cache Initial Comment: There's a problem with the use of ThreadLocals for caching, it's never cleaned up. If the ClassLoader is thrown away (e.g. if a webapp is reloaded), then the ThreadLocal stuff hold until the Thread dies, which might take a while in a Servlet container where the Threads are pooled. In such situation, DOM4J would prevent the webapp's ClassLoader to be garbage collected, which would creates a a memory leak and eventually an OOM if the webapp is reloaded too many times. To fix the problem, the use of ThreadLocal could be avoided completly by using thread-safe singleton instead. There's three places it's used. There's the getInstance() method from DOMDocumentFactory and DocumentFactory. In that situation, it doesn't seem an appropriate use to me. First, the DOM spec doesn't mandate for that implementation to be thread safe, and anyway returning a different instance per thread is no help there, each instance can still be used in different threads. And you would usually expect a getInstance() method to always return the same object. Also, DOMDocumentFactory already seems perfectly thread safe. The last use of ThreadLocal I identified is in QName, basically for ThreadLocal caching, which doesn't really seem needed, QNameCache seems thread-safe as far as I can see, unless the content of the HashMaps are not meant to be shared by differents calls to DocumentFactory? But anyway, I don't really thing it would too hard to get rid of the ThreadLocal there. http://article.gmane.org/gmane.comp.java.springframework.devel/6216 ---------------------------------------------------------------------- Comment By: Jan Dvorak (jdvorak) Date: 2004-11-23 09:41 Message: Logged In: YES user_id=22470 It looks like there are many suggestions what to do with QName and Namespace caching: ranging from a singleton cache to dropping caching altogether, with the current ThreadLocal being a valid choice still. Different usage scenarios have different needs. I'd suggest that we make the caching strategy a configurable feature, somehow. Perhaps an abstract CachingStrategy class that has methods to produce a QNameCache and NamespaceCache, given a Document and/or a DocumentFactory (some implementations will not use the arguments). Having a static setter for the CachingStrategy in QName and/or Namespace would allow use to change the caching strategy setting programatically; the default choice could be based on a system property. ---------------------------------------------------------------------- Comment By: Maarten Coene (maartenc) Date: 2004-11-22 20:13 Message: Logged In: YES user_id=178745 I think we should get rid of these (singleton) caches. I believe it would be better to keep these caches at the Document level: each Document can have it's own cache of QName/Namespace objects. This would avoid us to write any synchronization code as I don't think dom4j should provide thread-safety at the Document level: it's the responsability of the application to synchronize the access to the *same* Document instance. However, dom4j should be thread-safe accross multiple Document instances. I think this can also improve performance as we don't need any synchronized methods anymore to manage the access to these caches. In addition, it could solve some of the problems we have now with respect to the Namespace objects that are shared by multiple document instances. Maarten ---------------------------------------------------------------------- Comment By: Guillaume Poirier (gpoirier) Date: 2004-11-22 04:43 Message: Logged In: YES user_id=942521 Oh, and by the way, I checked out the source code from CVS, I removed the ThreadLocals, and all the tests were still sucesful, including one named ThreadingTest. ---------------------------------------------------------------------- Comment By: Guillaume Poirier (gpoirier) Date: 2004-11-22 04:41 Message: Logged In: YES user_id=942521 Well, I admit I might have missed something, but when I said it would be thread safe, I was including concurrent modification of state causing stale or corrupted data, and I still don't see where it would happen. In QName, I don't see any state that can be modified once it has been put in the HashMap. In QNameCache, the only potential negative effect of concurrent access that I can see is if two requests are made for the same name/Namespace at the same time, then one would get kicked out of the HashMap. But it doesn't really seem to matter, because I can only see entries added in the cache, nothing is ever removed. Am I missing something, what shared state is modified? ---------------------------------------------------------------------- Comment By: David Lucas (ddlucas) Date: 2004-11-22 01:56 Message: Logged In: YES user_id=633013 One of the issues we ran into in the past was not thread-safety but "thread hot" protection. The problem is when two seperate threads modify data and the state changes during an operation. It is like a race condition, but the issues can be very dangerous between threads. That was one of the drivers to have a seperate cache per thread. It was not the best solution at the time, but was a solution that resolved the issue without over complicating what already existed. I will look into cleaning up the cache and removing ThreadLocal / Singleton caches where it makes sense. Thanks for bringing this issue back into the light. We are hoping to address this before the next release. ---------------------------------------------------------------------- Comment By: Guillaume Poirier (gpoirier) Date: 2004-11-21 17:20 Message: Logged In: YES user_id=942521 The thing is, that code would be thread safe even without the use of ThreadLocal. It doesn't provice any benifit, the only place that it might allow to avoid synchronization is in QNameCache for the HashMap, but they ARE synchronized anyway. And in that case, if you were to worry about the synchronized HashMap performance, then you can use Doug Lea's ConcurrentHashMap. ---------------------------------------------------------------------- Comment By: David Lucas (ddlucas) Date: 2004-11-21 09:32 Message: Logged In: YES user_id=633013 This is a limitation of using ThreadLocal for a quick cache. However, DOM4J must maintain its thread safety. There are several users that have demonstrated that need in the past and it would not be wise to remove the implied requirement since it currently is thread-safe. The problem is a contextual one. A more elaborate caching algorithm will be required to avoid the potential memory leak. Something in the neighborhood of clearing the cache of things not recently used or even clear it periodically based on time since last cache cleared. A ThreadLocal cache does provide some benefits by avoiding synchronization between multiple threads. The fix for this bug should maintain equal performance or better while maintaining thread safety. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1070309&group_id=16035 ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ dom4j-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-dev