Graham, this is impressive and very important work. I haven't begun to digest
it all, but maybe I'll start with one question to get a better understanding...
2) A ThreadLocal in the CachingServiceImpl was also leaking - the service did
attempt to clean up in the shutdown, but it would only ever have been able to
clear up the resources of the thread executing the shutdown, not all the other
threads that would have processed requests.
Investigating the second issue, it turned out that this was being used to keep
track of Request scope caches - as well as potentially leaking resources
(across requests), this highlighted a problem in the service contract, and a
dangerous coupling between the implementation of the services.
What resources were leaking here exactly? Leaking resources across requests
can have some serious consequences depending on the resources, and I'm curious
to know exactly what type of errant behavior we could expect from a 1.6.x
version running the flawed services framework.
That's at least a starting point for me to wrap my brain around this analysis.
Thanks for all the effort that went into this.
P.S. - Should this thread be in dspace-devel instead of tech perhaps, or did
you want to inform a broader audience?
--
sands fish
Senior Software Engineer
MIT Libraries
Technology Research & Development
sa...@mit.edu<mailto:sa...@mit.edu>
E25-131
On Oct 4, 2010, at 7:55 PM, Graham Triggs wrote:
Hello all,
So, we've been debating the performance / resource usage of DSpace 1.6. This
has just happened to coincide with work that I needed to do to qualify DSpace
1.6 (and/or 1.7) for our own use. So, for the past week or so, I've been
hammering away at profiling DSpace. This isn't a comprehensive analysis of the
entire codebase, but focusing on the serious issue of stability and resource
leakage.
This immediately led to finding two issues with the dspace-services code and
it's dependencies:
1) The resources in the FinalizableReferenceQueue were leaking (this was noted
before in a Tomcat stack trace posted here) - this could have been avoided if a
newer version of reflectutils had been used, along with the LifeCycleManager it
contains
2) A ThreadLocal in the CachingServiceImpl was also leaking - the service did
attempt to clean up in the shutdown, but it would only ever have been able to
clear up the resources of the thread executing the shutdown, not all the other
threads that would have processed requests.
Investigating the second issue, it turned out that this was being used to keep
track of Request scope caches - as well as potentially leaking resources
(across requests), this highlighted a problem in the service contract, and a
dangerous coupling between the implementation of the services.
The Caching service was written to return a Request scoped cache without any
valid Request to bind it to - this was explicitly used in the listeners /
filters to acquire a cache and assign request objects into it before the
request service was called to 'start' a request and bind to the scoped cache.
There were also tests that expected this behaviour. Whilst the request service
will clear a request scoped cache if it's been bound to it, you can't guarantee
that the cache gets bound to a request.
Worse though, the Request service was depending on the Caching service tracking
request scoped caches against the current thread in order to be able to track
the current request. So, the one responsibility that the request service should
have, it was passing on and hoping that another service might do the job for it
without it being explicitly clear in the contract.
If that wasn't bad enough, getCache doesn't actually properly respect the
CacheScope when trying to retrieve a particular cache (it could retrieve a
non-Request scoped cache if one of the same name had already been created,
regardless of the scope parameter). The Caching service also registered an
MBean that was never removed. And, to wrap up my examination of the Caching
service, it didn't close an InputStream that it opened.
So, onwards - wanting to simplify this code a bit (as we don't need to support
non-Spring IoC containers, at least not yet - and we do need to be able to
understand and maintain this code) - I quickly noticed an issue in the
SpringServiceManager where it returned multiple instances of the same service
name, because it was under the mistaken belief that it needed to track some of
the service names itself, when it was actually getting them returned from
Spring.
And the ServiceMixinManager was entirely redundant - as long as we can rely on
Spring, we can rely on the container to do everything that the Mixin manager
does, without us having to worry about the complexity of it (and it is both
relatively complex, and a source of our problems earlier as it uses the
FinalizableReferenceQueue). I've actually got a real bugbear about us using the
term 'mixin'. A mixin means something specific - a class that provides an
actual implementation, and is combined with other mixins to produce other
classes [through multiple-inheritance]. When you look at what is implemented in
dspace-services under the term mixin, well we don't actually have mixins... we
have interfaces that are implemented by service classes. Each service class
completely and wholly provide implementations of the methods defined in all the
interfaces they implement. That's a concept that Java has had for a long time,
and is (should be) well understood by all Java programmers. That doesn't make
them mixins. We don't/shouldn't want 'mixins' (multiple inheritance - there is
a reason Java doesn't support it). And we shouldn't be using a term incorrectly
just because it's cool.
While I'm on a terminology rant, I'm not particularly happy about the
'interceptors' (ie. RequestInterceptor). They are looking and feeling a lot
more like listeners than interceptors - allowing other services to react to
events taking place (start request / end request), rather than affecting it's
behaviour. I've not renamed them, for the sake of keeping as much of the
existing API intact as I can (whilst clearing out [currently] redundant or
problematic code), but actually it seems that it should really be pushing a
synchronous and immediately executing event notification, so that the listener
registration can occur in a single place rather than duplicating that
functionality anywhere similar concepts may be required.
OK, less ranting, back to actual issues. Related to the RequestService, the
SessionService lifecycle methods really don't seem to make much sense - should
there be a request in place before you start a session? Are you going to get a
clean session or attach to an existing one? Does end terminate the session, or
simply release it? Add to that, the Threading for expiry of sessions was
entirely problematic, existing on a different cycle to the container session
manager. So, I took the brutal decision to kill those lifecycle methods - you
have to have a request to get a session.
What else? Well, the initial proposal to bring dspace-services into 1.6 stated
that the MBean registration would be removed. That wasn't the case for the
CachingService, nor was it the case for the Kernel - which actually registered
a UUID based MBean name. Basically, a bit pointless. I made it properly
optional (it will register if you specify a name, but won't if 'anonymous') -
but given the proposal for 1.6, I was considering binning it entirely.
Especially as the MBean registration and de-registration was split over
multiple classes.
Other problems include some odd switching of config names in the
configurationservice when it tries to do variable replacement - such to the
point I was entirely confused why any of the unit tests would ever have worked.
There were also 9 threading issues, and a few other problems, thrown up by code
inspection.
So, we have a prototype of a simplified and [hopefully mostly :)] corrected and
cleaned up version of dspace-services here:
http://scm.dspace.org/svn/repo/modules/dspace-services/branches/simplified-prototype/
To recap the changes:
1) RequestService is no longer dependent on CachingService, or it's internal
implentation
2) CachingService registers as a listener to RequestService to manage the
lifecycle of Request scope caches, and only returns request scoped caches when
there is a request
3) Caches returned are always of the correct scope
4) The request objects thrown into the cache are no longer there (you can
retrieve them from the RequestService)
5) The Kernel / manager / initializaer classes have been detangled
6) MBean registration is completely optional (and should be removed imho)
7) The ServiceMixinManager is removed as it's entirely unnecessary whilst
Spring is our only IoC container (and there is *no* good reason to complicate
things by supporting combining others at this stage)
8) reflectutils removed (reflection code no longer necessary - other utility
classes provided by JVM or Spring) - this means that the Sakai Maven repository
can be removed (I've had problems in the past with some artifacts being
retrieved from there)
9) SessionService simplified, and consistent APIs in place across Request and
Session services.
10) Multiple threading and resource issues cleaned up
Aside from having to modify a few tests that relied on the bizarre contracts
that were in place (ie. being able to retrieve request scoped caches without a
valid request), and removing tests related to code that is now gone, everything
passes all of the tests that were already in place. This makes it pretty much a
drop in replacement for the existing dspace-services code - certainly for the
core of 1.6, can't say about other code that may be depending on
dspace-services.
You'll note that because of this, I haven't been as ambitious as I would like
in ripping this apart. I would love to get rid of the kernel startup, and just
rely on Spring starting up the kernel and service managers - injecting the bean
factory into the managers it instantiates as appropriate. And pull out things
like the interceptors / listeners into events reducing even more complex code
potentially duplicated in services. But I had to start somewhere, and this
seems to be the right sort of area for looking at DSpace 1.7. This shouldn't
stop us being more aggressive at simplifying and reducing this code later.
And to take us back to the earlier issues about leakage inside a container -
dspace-services alone doesn't make the applications cleanup correctly. There
needs to be some adjustments to the context listeners, and their ordering in
the web.xml. You also have to take care with the JDBC drivers (I now have a
reworked DatabaseManager that allows you to retrieve a DataSource from JNDI if
available, and if not creates a DataSource using the entries in dspace.cfg, the
structure of which means that you do not need the JDBC driver, DBCP or
commons-pool in the webapp if you have a JNDI datasource.
But with a bit of care and attention taken in the application assembly and
deployment, the JSPUI does unload cleanly.
XMLUI is still proving a bit more troublesome. There are two instances of
resource leakage within Cocoon itself - in the sitemap code, it puts a stack
into a ThreadLocal, but never resets the contents (even when it's empty). The
flowscript code also doesn't exit the Rhino contexts correctly. Both of the
issues will prevent cleaning up of the resources.
With patched versions of those jars in place, as well as the same JDBC and
listener issues from jspui - well, it still doesn't quite work reliably
(although the profiler can no longer detect any GC roots). If I disable all of
the sitemap and config reloading, *and* wait 10 minutes after undeploying the
application, it will clean up and unload correctly. All I know right now is
that there isn't a Thread showing up that would prevent this. It looks like
it's some system cache (like in URL keep alives), or a session holding data.
But it neither will unload either way with the current version of
dspace-services - it needs the cleaned up/simplified version.
One more thing about Tomcat - there is often a lot of bad press for Tomcat,
blaming it for not unloading classes. Not only is this blatantly untrue (if
your application can be unloaded, it will be - it's only the application
leaking resources that stops it), they've made a lot of strides in the most
recent releases (6.0.28 / 6.0.29) to help combat potential resources leaks in
your applications. Take a look at the configuration options on Context:
http://tomcat.apache.org/tomcat-6.0-doc/config/context.html (clear
ThreadLocals / stop threads and timers)
and the listeners:
http://tomcat.apache.org/tomcat-6.0-doc/config/listeners.html (JRE leak
protection, that protects against a number of things, including the URL caching
that is problematic on Windows)
I wouldn't rely on these being able to prevent your application from causing a
problem, and it's much better to ensure that applications are properly behaved
even without Tomcat attempting to prevent problems - but it can still be useful
in some circumstances.
G
<ATT00001..c><ATT00002..c>
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
Spend less time writing and rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech