I'd have no problem cutting a 0.3.1 update with this change. I'd kind of like to wait to see if we get any other major bugs but it would be possible to create a 0.3.1-snapshot branch were fixes like this could be applied.
Copying the dev list so the other dev's can weigh in... - James Chris Berry wrote: > James, > I wonder if it might be prudent to backport this change into 0.3.0?? > As Bryon said, it is a very substantial leak -- leaking >1GB in a matter > of minutes in our load tests. > Although, it is relatively easy to workaround. > Cheers, > -- Chris > > On Nov 5, 2007, at 5:39 PM, James M Snell wrote: > >> Ok, I went ahead and committed the change that removes the wrapper >> cache. Things seem to be working fine. Please kick the tires and see >> if things are working out. >> >> - James >> >> Bryon Jacob wrote: >>> Hey All - >>> >>> I'm working on a data service based on Abdera (working with Chris Berry, >>> who's a regular on these lists...) When we were running our first >>> battery of serious load testing on our system, we encountered >>> memory-leaky behavior, and a profiler showed us that we were indeed >>> leaking hundreds of megabytes a minute, all traceable back to the >>> wrappers field on org.apache.abdera.factory.ExtensionFactoryMap. This >>> field is a map from elements to their wrappers, if any. At first, I was >>> puzzled by the memory leak, as the field is initialized thusly: >>> >>> this.wrappers = Collections.synchronizedMap( new >>> WeakHashMap<Element,Element>()); >>> >>> clearly, the implementor took care to make sure that this cache would >>> not leak by making it a WeakHashMap, which generally guarantees that the >>> map itself will not keep a key and its corresponding entry from being >>> garbage collected. I dug throughout our application code to find if we >>> were actually holding other references to these objects, and I googled >>> for anyone having problems with esoteric interactions between >>> Collections.synchronizedMap and WeakHashMaps - found nothing there. >>> Then I went back to square one and re-read the WeakHashMap javadoc very >>> carefully. Here's the relevant section: >>> >>> Implementation note: The value objects in a WeakHashMap are held by >>> ordinary strong references. Thus care should be taken to ensure that >>> value objects do not strongly refer to their own keys, either directly >>> or indirectly, since that will prevent the keys from being discarded. >>> Note that a value object may refer indirectly to its key via the >>> WeakHashMap itself; that is, a value object may strongly refer to some >>> other key object whose associated value object, in turn, strongly refers >>> to the key of the first value object. One way to deal with this is to >>> wrap values themselves within WeakReferences before inserting, as in: >>> m.put(key, new WeakReference(value)), and then unwrapping upon each get. >>> >>> This is why there is a memory leak - the map is a mapping from elements >>> to their wrappers - by the very nature of the object being a wrapper of >>> the element, it will usually have a strong reference to the element >>> itself, which is the key! You can verify that Abdera wrappers, in >>> general, will do this by looking at >>> org.apache.abdera.model.ElementWrapper, which takes the element being >>> wrapped as its constructor argument, and holds a strong reference to it >>> as an instance variable. >>> >>> This map is an optimization to memoize the calls to getElementWrapper() >>> and not reconstruct them more than is necessary - it is not needed for >>> abdera to function properly, so we have temporarily worked around the >>> problem in our own application like so - we used to acquire our >>> FOMFactory by calling abdera.getFactory() on our >>> org.apache.abdera.Abdera instance, and re-using that singleton >>> throughout our application. Now we construct a new FOMFactory with new >>> FOMFactory(abdera) once per request to the server, and since the only >>> appreciable state on the factory is this map itself, this is a valid >>> work-around. >>> >>> I'd initially planned to really fix this issue and submit a patch along >>> with this message, but digging a little deeper, I'm not sure that the >>> correct fix is crystal clear... We could do as the javadoc above >>> suggests, and wrap the values with WeakReferences to plug the leak, or >>> we could use a LinkedHashMap configured as an LRU cache to just bound >>> the cache, so it can't grow out of control - but right now, I don't >>> think that either of those solutions would be correct, because it seems >>> to me that none of the objects in the hierarchy rooted at FOMElement >>> define equals() and/or hashCode() methods, so all of the objects are >>> cached based on their actual object identity. It seems that in the all >>> likely use cases, instances of FOMElement and its descendants are >>> re-parsed on every request to a server running abdera, and so what we >>> will see is cache misses virtually 100% of the time, so even though we'd >>> have plugged the leak, strictly speaking, we would have ignored the >>> underlying issue that we're caching data on every request that will be >>> fundamentally unable to be retrieved on subsequent requests. This is >>> based only on my reading over the code for a few hours, so I could be >>> missing something, and I also might be forgetting about a use case that >>> demands and makes proper use of this memoization, but as it stands right >>> now, my recommended fix would probably be to just cut out the cache >>> altogether, and allow for wrappers to get constructed fresh every time >>> they are requested. One more possibility is that the cache is actually >>> a useful optimization, but only during the scope of one request - in >>> which case the "work-around" we are using now is actually the best >>> practice, and the fix would be to remove the factory instance on the >>> Abdera class... >>> >>> I'd like to hear from the Abdera developers what their thoughts are on >>> this issue, and what the best resolution is likely to be. This is no >>> longer a pressing issue for our team, but it is potentially a time bomb >>> waiting to blow up for any project dependent on Abdera. >>> >>> thanks! (and thanks for Abdera, generally - we're easily a year ahead >>> of where we'd be on this project without it!) >>> >>> -Bryon (long-time listener, first-time caller) >>> >>> > > S'all good --- chriswberry at gmail dot com > > > >
