Graham, this is impressive and very important work.  I haven't begun to digest 
it all, but maybe I'll start with one question to get a better understanding...

2) A ThreadLocal in the CachingServiceImpl was also leaking - the service did 
attempt to clean up in the shutdown, but it would only ever have been able to 
clear up the resources of the thread executing the shutdown, not all the other 
threads that would have processed requests.

Investigating the second issue, it turned out that this was being used to keep 
track of Request scope caches - as well as potentially leaking resources 
(across requests), this highlighted a problem in the service contract, and a 
dangerous coupling between the implementation of the services.

What resources were leaking here exactly?  Leaking resources across requests 
can have some serious consequences depending on the resources, and I'm curious 
to know exactly what type of errant behavior we could expect from a 1.6.x 
version running the flawed services framework.

That's at least a starting point for me to wrap my brain around this analysis.  
Thanks for all the effort that went into this.


P.S. - Should this thread be in dspace-devel instead of tech perhaps, or did 
you want to inform a broader audience?

--
sands fish
Senior Software Engineer
MIT Libraries
Technology Research & Development
sa...@mit.edu<mailto:sa...@mit.edu>
E25-131





On Oct 4, 2010, at 7:55 PM, Graham Triggs wrote:

Hello all,

So, we've been debating the performance / resource usage of DSpace 1.6. This 
has just happened to coincide with work that I needed to do to qualify DSpace 
1.6 (and/or 1.7) for our own use. So, for the past week or so, I've been 
hammering away at profiling DSpace. This isn't a comprehensive analysis of the 
entire codebase, but focusing on the serious issue of stability and resource 
leakage.

This immediately led to finding two issues with the dspace-services code and 
it's dependencies:

1) The resources in the FinalizableReferenceQueue were leaking (this was noted 
before in a Tomcat stack trace posted here) - this could have been avoided if a 
newer version of reflectutils had been used, along with the LifeCycleManager it 
contains

2) A ThreadLocal in the CachingServiceImpl was also leaking - the service did 
attempt to clean up in the shutdown, but it would only ever have been able to 
clear up the resources of the thread executing the shutdown, not all the other 
threads that would have processed requests.

Investigating the second issue, it turned out that this was being used to keep 
track of Request scope caches - as well as potentially leaking resources 
(across requests), this highlighted a problem in the service contract, and a 
dangerous coupling between the implementation of the services.

The Caching service was written to return a Request scoped cache without any 
valid Request to bind it to - this was explicitly used in the listeners / 
filters to acquire a cache and assign request objects into it before the 
request service was called to 'start' a request and bind to the scoped cache. 
There were also tests that expected this behaviour. Whilst the request service 
will clear a request scoped cache if it's been bound to it, you can't guarantee 
that the cache gets bound to a request.

Worse though, the Request service was depending on the Caching service tracking 
request scoped caches against the current thread in order to be able to track 
the current request. So, the one responsibility that the request service should 
have, it was passing on and hoping that another service might do the job for it 
without it being explicitly clear in the contract.

If that wasn't bad enough, getCache doesn't actually properly respect the 
CacheScope when trying to retrieve a particular cache (it could retrieve a 
non-Request scoped cache if one of the same name had already been created, 
regardless of the scope parameter). The Caching service also registered an 
MBean that was never removed. And, to wrap up my examination of the Caching 
service, it didn't close an InputStream that it opened.

So, onwards - wanting to simplify this code a bit (as we don't need to support 
non-Spring IoC containers, at least not yet - and we do need to be able to 
understand and maintain this code) - I quickly noticed an issue in the 
SpringServiceManager where it returned multiple instances of the same service 
name, because it was under the mistaken belief that it needed to track some of 
the service names itself, when it was actually getting them returned from 
Spring.

And the ServiceMixinManager was entirely redundant - as long as we can rely on 
Spring, we can rely on the container to do everything that the Mixin manager 
does, without us having to worry about the complexity of it (and it is both 
relatively complex, and a source of our problems earlier as it uses the 
FinalizableReferenceQueue). I've actually got a real bugbear about us using the 
term 'mixin'. A mixin means something specific - a class that provides an 
actual implementation, and is combined with other mixins to produce other 
classes [through multiple-inheritance]. When you look at what is implemented in 
dspace-services under the term mixin, well we don't actually have mixins... we 
have interfaces that are implemented by service classes. Each service class 
completely and wholly provide implementations of the methods defined in all the 
interfaces they implement. That's a concept that Java has had for a long time, 
and is (should be) well understood by all Java programmers. That doesn't make 
them mixins. We don't/shouldn't want 'mixins' (multiple inheritance - there is 
a reason Java doesn't support it). And we shouldn't be using a term incorrectly 
just because it's cool.

While I'm on a terminology rant, I'm not particularly happy about the 
'interceptors' (ie. RequestInterceptor). They are looking and feeling a lot 
more like listeners than interceptors - allowing other services to react to 
events taking place (start request / end request), rather than affecting it's 
behaviour. I've not renamed them, for the sake of keeping as much of the 
existing API intact as I can (whilst clearing out [currently] redundant or 
problematic code), but actually it seems that it should really be pushing a 
synchronous and immediately executing event notification, so that the listener 
registration can occur in a single place rather than duplicating that 
functionality anywhere similar concepts may be required.

OK, less ranting, back to actual issues. Related to the RequestService, the 
SessionService lifecycle methods really don't seem to make much sense - should 
there be a request in place before you start a session? Are you going to get a 
clean session or attach to an existing one? Does end terminate the session, or 
simply release it? Add to that, the Threading for expiry of sessions was 
entirely problematic, existing on a different cycle to the container session 
manager. So, I took the brutal decision to kill those lifecycle methods - you 
have to have a request to get a session.

What else? Well, the initial proposal to bring dspace-services into 1.6 stated 
that the MBean registration would be removed. That wasn't the case for the 
CachingService, nor was it the case for the Kernel - which actually registered 
a UUID based MBean name. Basically, a bit pointless. I made it properly 
optional (it will register if you specify a name, but won't if 'anonymous') - 
but given the proposal for 1.6, I was considering binning it entirely. 
Especially as the MBean registration and de-registration was split over 
multiple classes.

Other problems include some odd switching of config names in the 
configurationservice when it tries to do variable replacement - such to the 
point I was entirely confused why any of the unit tests would ever have worked. 
There were also 9 threading issues, and a few other problems, thrown up by code 
inspection.

So, we have a prototype of a simplified and [hopefully mostly :)] corrected and 
cleaned up version of dspace-services here:

http://scm.dspace.org/svn/repo/modules/dspace-services/branches/simplified-prototype/

To recap the changes:

1) RequestService is no longer dependent on CachingService, or it's internal 
implentation
2) CachingService registers as a listener to RequestService to manage the 
lifecycle of Request scope caches, and only returns request scoped caches when 
there is a request
3) Caches returned are always of the correct scope
4) The request objects thrown into the cache are no longer there (you can 
retrieve them from the RequestService)
5) The Kernel / manager / initializaer classes have been detangled
6) MBean registration is completely optional (and should be removed imho)
7) The ServiceMixinManager is removed as it's entirely unnecessary whilst 
Spring is our only IoC container (and there is *no* good reason to complicate 
things by supporting combining others at this stage)
8) reflectutils removed (reflection code no longer necessary - other utility 
classes provided by JVM or Spring) - this means that the Sakai Maven repository 
can be removed (I've had problems in the past with some artifacts being 
retrieved from there)
9) SessionService simplified, and consistent APIs in place across Request and 
Session services.
10) Multiple threading and resource issues cleaned up

Aside from having to modify a few tests that relied on the bizarre contracts 
that were in place (ie. being able to retrieve request scoped caches without a 
valid request), and removing tests related to code that is now gone, everything 
passes all of the tests that were already in place. This makes it pretty much a 
drop in replacement for the existing dspace-services code - certainly for the 
core of 1.6, can't say about other code that may be depending on 
dspace-services.

You'll note that because of this, I haven't been as ambitious as I would like 
in ripping this apart. I would love to get rid of the kernel startup, and just 
rely on Spring starting up the kernel and service managers - injecting the bean 
factory into the managers it instantiates as appropriate. And pull out things 
like the interceptors / listeners into events reducing even more complex code 
potentially duplicated in services. But I had to start somewhere, and this 
seems to be the right sort of area for looking at DSpace 1.7. This shouldn't 
stop us being more aggressive at simplifying and reducing this code later.

And to take us back to the earlier issues about leakage inside a container - 
dspace-services alone doesn't make the applications cleanup correctly. There 
needs to be some adjustments to the context listeners, and their ordering in 
the web.xml. You also have to take care with the JDBC drivers (I now have a 
reworked DatabaseManager that allows you to retrieve a DataSource from JNDI if 
available, and if not creates a DataSource using the entries in dspace.cfg, the 
structure of which means that you do not need the JDBC driver, DBCP or 
commons-pool in the webapp if you have a JNDI datasource.

But with a bit of care and attention taken in the application assembly and 
deployment, the JSPUI does unload cleanly.

XMLUI is still proving a bit more troublesome. There are two instances of 
resource leakage within Cocoon itself - in the sitemap code, it puts a stack 
into a ThreadLocal, but never resets the contents (even when it's empty). The 
flowscript code also doesn't exit the Rhino contexts correctly. Both of the 
issues will prevent cleaning up of the resources.

With patched versions of those jars in place, as well as the same JDBC and 
listener issues from jspui - well, it still doesn't quite work reliably 
(although the profiler can no longer detect any GC roots). If I disable all of 
the sitemap and config reloading, *and* wait 10 minutes after undeploying the 
application, it will clean up and unload correctly. All I know right now is 
that there isn't a Thread showing up that would prevent this. It looks like 
it's some system cache (like in URL keep alives), or a session holding data.

But it neither will unload either way with the current version of 
dspace-services - it needs the cleaned up/simplified version.

One more thing about Tomcat - there is often a lot of bad press for Tomcat, 
blaming it for not unloading classes. Not only is this blatantly untrue (if 
your application can be unloaded, it will be - it's only the application 
leaking resources that stops it), they've made a lot of strides in the most 
recent releases (6.0.28 / 6.0.29) to help combat potential resources leaks in 
your applications. Take a look at the configuration options on Context:

http://tomcat.apache.org/tomcat-6.0-doc/config/context.html   (clear 
ThreadLocals / stop threads and timers)

and the listeners:

http://tomcat.apache.org/tomcat-6.0-doc/config/listeners.html (JRE leak 
protection, that protects against a number of things, including the URL caching 
that is problematic on Windows)

I wouldn't rely on these being able to prevent your application from causing a 
problem, and it's much better to ensure that applications are properly behaved 
even without Tomcat attempting to prevent problems - but it can still be useful 
in some circumstances.

G
<ATT00001..c><ATT00002..c>

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to