Hello all,

So, we've been debating the performance / resource usage of DSpace 1.6. This
has just happened to coincide with work that I needed to do to qualify
DSpace 1.6 (and/or 1.7) for our own use. So, for the past week or so, I've
been hammering away at profiling DSpace. This isn't a comprehensive analysis
of the entire codebase, but focusing on the serious issue of stability and
resource leakage.

This immediately led to finding two issues with the dspace-services code and
it's dependencies:

1) The resources in the FinalizableReferenceQueue were leaking (this was
noted before in a Tomcat stack trace posted here) - this could have been
avoided if a newer version of reflectutils had been used, along with the
LifeCycleManager it contains

2) A ThreadLocal in the CachingServiceImpl was also leaking - the service
did attempt to clean up in the shutdown, but it would only ever have been
able to clear up the resources of the thread executing the shutdown, not all
the other threads that would have processed requests.

Investigating the second issue, it turned out that this was being used to
keep track of Request scope caches - as well as potentially leaking
resources (across requests), this highlighted a problem in the service
contract, and a dangerous coupling between the implementation of the
services.

The Caching service was written to return a Request scoped cache without any
valid Request to bind it to - this was explicitly used in the listeners /
filters to acquire a cache and assign request objects into it before the
request service was called to 'start' a request and bind to the scoped
cache. There were also tests that expected this behaviour. Whilst the
request service will clear a request scoped cache if it's been bound to it,
you can't guarantee that the cache gets bound to a request.

Worse though, the Request service was depending on the Caching service
tracking request scoped caches against the current thread in order to be
able to track the current request. So, the one responsibility that the
request service should have, it was passing on and hoping that another
service might do the job for it without it being explicitly clear in the
contract.

If that wasn't bad enough, getCache doesn't actually properly respect the
CacheScope when trying to retrieve a particular cache (it could retrieve a
non-Request scoped cache if one of the same name had already been created,
regardless of the scope parameter). The Caching service also registered an
MBean that was never removed. And, to wrap up my examination of the Caching
service, it didn't close an InputStream that it opened.

So, onwards - wanting to simplify this code a bit (as we don't need to
support non-Spring IoC containers, at least not yet - and we do need to be
able to understand and maintain this code) - I quickly noticed an issue in
the SpringServiceManager where it returned multiple instances of the same
service name, because it was under the mistaken belief that it needed to
track some of the service names itself, when it was actually getting them
returned from Spring.

And the ServiceMixinManager was entirely redundant - as long as we can rely
on Spring, we can rely on the container to do everything that the Mixin
manager does, without us having to worry about the complexity of it (and it
is both relatively complex, and a source of our problems earlier as it uses
the FinalizableReferenceQueue). I've actually got a real bugbear about us
using the term 'mixin'. A mixin means something specific - a class that
provides an actual implementation, and is combined with other mixins to
produce other classes [through multiple-inheritance]. When you look at what
is implemented in dspace-services under the term mixin, well we don't
actually have mixins... we have interfaces that are implemented by service
classes. Each service class completely and wholly provide implementations of
the methods defined in all the interfaces they implement. That's a concept
that Java has had for a long time, and is (should be) well understood by all
Java programmers. That doesn't make them mixins. We don't/shouldn't want
'mixins' (multiple inheritance - there is a reason Java doesn't support it).
And we shouldn't be using a term incorrectly just because it's cool.

While I'm on a terminology rant, I'm not particularly happy about the
'interceptors' (ie. RequestInterceptor). They are looking and feeling a lot
more like listeners than interceptors - allowing other services to react to
events taking place (start request / end request), rather than affecting
it's behaviour. I've not renamed them, for the sake of keeping as much of
the existing API intact as I can (whilst clearing out [currently] redundant
or problematic code), but actually it seems that it should really be pushing
a synchronous and immediately executing event notification, so that the
listener registration can occur in a single place rather than duplicating
that functionality anywhere similar concepts may be required.

OK, less ranting, back to actual issues. Related to the RequestService, the
SessionService lifecycle methods really don't seem to make much sense -
should there be a request in place before you start a session? Are you going
to get a clean session or attach to an existing one? Does end terminate the
session, or simply release it? Add to that, the Threading for expiry of
sessions was entirely problematic, existing on a different cycle to the
container session manager. So, I took the brutal decision to kill those
lifecycle methods - you have to have a request to get a session.

What else? Well, the initial proposal to bring dspace-services into 1.6
stated that the MBean registration would be removed. That wasn't the case
for the CachingService, nor was it the case for the Kernel - which actually
registered a UUID based MBean name. Basically, a bit pointless. I made it
properly optional (it will register if you specify a name, but won't if
'anonymous') - but given the proposal for 1.6, I was considering binning it
entirely. Especially as the MBean registration and de-registration was split
over multiple classes.

Other problems include some odd switching of config names in the
configurationservice when it tries to do variable replacement - such to the
point I was entirely confused why any of the unit tests would ever have
worked. There were also 9 threading issues, and a few other problems, thrown
up by code inspection.

So, we have a prototype of a simplified and [hopefully mostly :)] corrected
and cleaned up version of dspace-services here:

http://scm.dspace.org/svn/repo/modules/dspace-services/branches/simplified-prototype/

To recap the changes:

1) RequestService is no longer dependent on CachingService, or it's internal
implentation
2) CachingService registers as a listener to RequestService to manage the
lifecycle of Request scope caches, and only returns request scoped caches
when there is a request
3) Caches returned are always of the correct scope
4) The request objects thrown into the cache are no longer there (you can
retrieve them from the RequestService)
5) The Kernel / manager / initializaer classes have been detangled
6) MBean registration is completely optional (and should be removed imho)
7) The ServiceMixinManager is removed as it's entirely unnecessary whilst
Spring is our only IoC container (and there is *no* good reason to
complicate things by supporting combining others at this stage)
8) reflectutils removed (reflection code no longer necessary - other utility
classes provided by JVM or Spring) - this means that the Sakai Maven
repository can be removed (I've had problems in the past with some artifacts
being retrieved from there)
9) SessionService simplified, and consistent APIs in place across Request
and Session services.
10) Multiple threading and resource issues cleaned up

Aside from having to modify a few tests that relied on the bizarre contracts
that were in place (ie. being able to retrieve request scoped caches without
a valid request), and removing tests related to code that is now gone,
everything passes all of the tests that were already in place. This makes it
pretty much a drop in replacement for the existing dspace-services code -
certainly for the core of 1.6, can't say about other code that may be
depending on dspace-services.

You'll note that because of this, I haven't been as ambitious as I would
like in ripping this apart. I would love to get rid of the kernel startup,
and just rely on Spring starting up the kernel and service managers -
injecting the bean factory into the managers it instantiates as appropriate.
And pull out things like the interceptors / listeners into events reducing
even more complex code potentially duplicated in services. But I had to
start somewhere, and this seems to be the right sort of area for looking at
DSpace 1.7. This shouldn't stop us being more aggressive at simplifying and
reducing this code later.

And to take us back to the earlier issues about leakage inside a container -
dspace-services alone doesn't make the applications cleanup correctly. There
needs to be some adjustments to the context listeners, and their ordering in
the web.xml. You also have to take care with the JDBC drivers (I now have a
reworked DatabaseManager that allows you to retrieve a DataSource from JNDI
if available, and if not creates a DataSource using the entries in
dspace.cfg, the structure of which means that you do not need the JDBC
driver, DBCP or commons-pool in the webapp if you have a JNDI datasource.

But with a bit of care and attention taken in the application assembly and
deployment, the JSPUI does unload cleanly.

XMLUI is still proving a bit more troublesome. There are two instances of
resource leakage within Cocoon itself - in the sitemap code, it puts a stack
into a ThreadLocal, but never resets the contents (even when it's empty).
The flowscript code also doesn't exit the Rhino contexts correctly. Both of
the issues will prevent cleaning up of the resources.

With patched versions of those jars in place, as well as the same JDBC and
listener issues from jspui - well, it still doesn't quite work reliably
(although the profiler can no longer detect any GC roots). If I disable all
of the sitemap and config reloading, *and* wait 10 minutes after undeploying
the application, it will clean up and unload correctly. All I know right now
is that there isn't a Thread showing up that would prevent this. It looks
like it's some system cache (like in URL keep alives), or a session holding
data.

But it neither will unload either way with the current version of
dspace-services - it needs the cleaned up/simplified version.

One more thing about Tomcat - there is often a lot of bad press for Tomcat,
blaming it for not unloading classes. Not only is this blatantly untrue (if
your application can be unloaded, it will be - it's only the application
leaking resources that stops it), they've made a lot of strides in the most
recent releases (6.0.28 / 6.0.29) to help combat potential resources leaks
in your applications. Take a look at the configuration options on Context:

http://tomcat.apache.org/tomcat-6.0-doc/config/context.html   (clear
ThreadLocals / stop threads and timers)

and the listeners:

http://tomcat.apache.org/tomcat-6.0-doc/config/listeners.html (JRE leak
protection, that protects against a number of things, including the URL
caching that is problematic on Windows)

I wouldn't rely on these being able to prevent your application from causing
a problem, and it's much better to ensure that applications are properly
behaved even without Tomcat attempting to prevent problems - but it can
still be useful in some circumstances.

G
------------------------------------------------------------------------------
Virtualization is moving to the mainstream and overtaking non-virtualized
environment for deploying applications. Does it make network security 
easier or more difficult to achieve? Read this whitepaper to separate the 
two and get a better understanding.
http://p.sf.net/sfu/hp-phase2-d2d
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to