Re: [Dspace-tech] A simplified version of dspace-services and Tomcat unloading

2010-10-10 Thread Graham Triggs
On 7 October 2010 06:55, Mark Diggory mdigg...@atmire.com wrote:

 You know I'm for (1) simplified spring only service manager. (2) jdbc
 data source delivered by servlet container (3) dropping reflectutils.


Well, the JDBC datasource is a change to dspace-api rather than -services,
but there is no reason why it can't go into DSpace 1.7.


 I'm not so sure about dropping the MBean registration completely, the
 whole idea was that we could have a container level service manager
 and services deployed across webapplications.  But I assume this is
 just too crazy an approach to maintain and educate others on its
 usage.  I'm just not convinced we would use a servlet container in
 this manner and simplification is no doubt more important...


My default position is to say that if we aren't actively using something, we
shouldn't have it. Whilst the registration technically works, we haven't got
close to properly testing and qualifying sharing resources across contexts
(/classloaders), and people will run into a lot of problems attempting to
use it that way.

Besides which, MBean isn't really the way to share resources in that way -
JNDI is. Although MBeans are available outside a container, whereas JNDI
isn't necessarily. I wouldn't class that as a good reason to use MBeans -
it's a good reason to have that part of the functionality switchable -
rather than a static manager that combines singleton and MBean registration
as conditional logic, it should be a static facade to either a singleton or
Mbean (or JNDI) implementation.

That's slightly beside the point though, as in general, I agree with you -
cross context sharing is a headache, a problem I don't think 'we' want to
try and solve, and difficult for people to understand. If you need that kind
of dynamic/component environment (and I don't think it's that big a deal to
either assemble a web application combining multiple artifacts, or deploying
it as a whole), then there are better ways of dealing with it (either
SpringDM or EJB).



 My only concern is that the ability I put inplace for addons to
 deliver their own spring config and have the manager load it (in a
 SpringDM style manner) be maintained, and I see its still there... :-)


It's possibly an area that could yet be simplified though (I've taken a
somewhat pragmatic approach so far to not rip up any existing public APIs
unless they are problematic in the current form). One question to ask is
would it only need the SpringDM style approach when you would be using
SpringDM (on the assumption that you are using that for a dynamic
environment)?


What if we put it in the dspace-services trunk, adjust the dspace
 trunk to use it and use this to get some community testing/feedback?
 If it fairs well over the next month... then we can do an official
 release of dspace-services just prior to the dspace 1.7.0 release
 candidates...


Yes, that sounds like a plan. Although it passes it's unit tests (with only
minor test modifications necessitated by the contract tightening), and I've
used it in a DSpace 1.6.2 application without any direct modifications
(related to the services at least), it does need more extensive
in-application testing.

G
--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] A simplified version of dspace-services and Tomcat unloading

2010-10-06 Thread Graham Triggs
On 5 October 2010 19:17, Sands Alden Fish sa...@mit.edu wrote:

 What resources were leaking here exactly?  Leaking resources across
 requests can have some serious consequences depending on the resources, and
 I'm curious to know exactly what type of errant behavior we could expect
 from a 1.6.x version running the flawed services framework.


I would say that the ability to leak resource across requests is more
theoretical than demonstrated in a real world scenario. By which I mean that
it's clearly demonstrated that the Caching service has the ability to store
data either at the wrong scope, or to have request scope caches that
actually get bound / re-bound to a later, different, request - however, I
haven't exercised a full DSpace application in such a way to observe this
happening.

But even if the application wouldn't normally run into an issue, that
doesn't mean we shouldn't tighten up it's behaviour to prevent it from
becoming an issue later.

G
--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] A simplified version of dspace-services and Tomcat unloading

2010-10-05 Thread Sands Alden Fish
Graham, this is impressive and very important work.  I haven't begun to digest 
it all, but maybe I'll start with one question to get a better understanding...

2) A ThreadLocal in the CachingServiceImpl was also leaking - the service did 
attempt to clean up in the shutdown, but it would only ever have been able to 
clear up the resources of the thread executing the shutdown, not all the other 
threads that would have processed requests.

Investigating the second issue, it turned out that this was being used to keep 
track of Request scope caches - as well as potentially leaking resources 
(across requests), this highlighted a problem in the service contract, and a 
dangerous coupling between the implementation of the services.

What resources were leaking here exactly?  Leaking resources across requests 
can have some serious consequences depending on the resources, and I'm curious 
to know exactly what type of errant behavior we could expect from a 1.6.x 
version running the flawed services framework.

That's at least a starting point for me to wrap my brain around this analysis.  
Thanks for all the effort that went into this.


P.S. - Should this thread be in dspace-devel instead of tech perhaps, or did 
you want to inform a broader audience?

--
sands fish
Senior Software Engineer
MIT Libraries
Technology Research  Development
sa...@mit.edumailto:sa...@mit.edu
E25-131





On Oct 4, 2010, at 7:55 PM, Graham Triggs wrote:

Hello all,

So, we've been debating the performance / resource usage of DSpace 1.6. This 
has just happened to coincide with work that I needed to do to qualify DSpace 
1.6 (and/or 1.7) for our own use. So, for the past week or so, I've been 
hammering away at profiling DSpace. This isn't a comprehensive analysis of the 
entire codebase, but focusing on the serious issue of stability and resource 
leakage.

This immediately led to finding two issues with the dspace-services code and 
it's dependencies:

1) The resources in the FinalizableReferenceQueue were leaking (this was noted 
before in a Tomcat stack trace posted here) - this could have been avoided if a 
newer version of reflectutils had been used, along with the LifeCycleManager it 
contains

2) A ThreadLocal in the CachingServiceImpl was also leaking - the service did 
attempt to clean up in the shutdown, but it would only ever have been able to 
clear up the resources of the thread executing the shutdown, not all the other 
threads that would have processed requests.

Investigating the second issue, it turned out that this was being used to keep 
track of Request scope caches - as well as potentially leaking resources 
(across requests), this highlighted a problem in the service contract, and a 
dangerous coupling between the implementation of the services.

The Caching service was written to return a Request scoped cache without any 
valid Request to bind it to - this was explicitly used in the listeners / 
filters to acquire a cache and assign request objects into it before the 
request service was called to 'start' a request and bind to the scoped cache. 
There were also tests that expected this behaviour. Whilst the request service 
will clear a request scoped cache if it's been bound to it, you can't guarantee 
that the cache gets bound to a request.

Worse though, the Request service was depending on the Caching service tracking 
request scoped caches against the current thread in order to be able to track 
the current request. So, the one responsibility that the request service should 
have, it was passing on and hoping that another service might do the job for it 
without it being explicitly clear in the contract.

If that wasn't bad enough, getCache doesn't actually properly respect the 
CacheScope when trying to retrieve a particular cache (it could retrieve a 
non-Request scoped cache if one of the same name had already been created, 
regardless of the scope parameter). The Caching service also registered an 
MBean that was never removed. And, to wrap up my examination of the Caching 
service, it didn't close an InputStream that it opened.

So, onwards - wanting to simplify this code a bit (as we don't need to support 
non-Spring IoC containers, at least not yet - and we do need to be able to 
understand and maintain this code) - I quickly noticed an issue in the 
SpringServiceManager where it returned multiple instances of the same service 
name, because it was under the mistaken belief that it needed to track some of 
the service names itself, when it was actually getting them returned from 
Spring.

And the ServiceMixinManager was entirely redundant - as long as we can rely on 
Spring, we can rely on the container to do everything that the Mixin manager 
does, without us having to worry about the complexity of it (and it is both 
relatively complex, and a source of our problems earlier as it uses the 
FinalizableReferenceQueue). I've actually got a real bugbear about us using the 
term 'mixin'. A mixin means 

Re: [Dspace-tech] A simplified version of dspace-services and Tomcat unloading

2010-10-05 Thread Kim Shepherd
Thanks for the detailed writeup and explanations, it's helping me understand
a lot of these issues better... looks like I still have some bedtime reading
(and DSpace profiling) ahead of me! ;-)

Cheers,

Kim

On 5 October 2010 12:55, Graham Triggs grahamtri...@gmail.com wrote:

 Hello all,

 So, we've been debating the performance / resource usage of DSpace 1.6.
 This has just happened to coincide with work that I needed to do to qualify
 DSpace 1.6 (and/or 1.7) for our own use. So, for the past week or so, I've
 been hammering away at profiling DSpace. This isn't a comprehensive analysis
 of the entire codebase, but focusing on the serious issue of stability and
 resource leakage.

 This immediately led to finding two issues with the dspace-services code
 and it's dependencies:

 1) The resources in the FinalizableReferenceQueue were leaking (this was
 noted before in a Tomcat stack trace posted here) - this could have been
 avoided if a newer version of reflectutils had been used, along with the
 LifeCycleManager it contains

 2) A ThreadLocal in the CachingServiceImpl was also leaking - the service
 did attempt to clean up in the shutdown, but it would only ever have been
 able to clear up the resources of the thread executing the shutdown, not all
 the other threads that would have processed requests.

 Investigating the second issue, it turned out that this was being used to
 keep track of Request scope caches - as well as potentially leaking
 resources (across requests), this highlighted a problem in the service
 contract, and a dangerous coupling between the implementation of the
 services.

 The Caching service was written to return a Request scoped cache without
 any valid Request to bind it to - this was explicitly used in the listeners
 / filters to acquire a cache and assign request objects into it before the
 request service was called to 'start' a request and bind to the scoped
 cache. There were also tests that expected this behaviour. Whilst the
 request service will clear a request scoped cache if it's been bound to it,
 you can't guarantee that the cache gets bound to a request.

 Worse though, the Request service was depending on the Caching service
 tracking request scoped caches against the current thread in order to be
 able to track the current request. So, the one responsibility that the
 request service should have, it was passing on and hoping that another
 service might do the job for it without it being explicitly clear in the
 contract.

 If that wasn't bad enough, getCache doesn't actually properly respect the
 CacheScope when trying to retrieve a particular cache (it could retrieve a
 non-Request scoped cache if one of the same name had already been created,
 regardless of the scope parameter). The Caching service also registered an
 MBean that was never removed. And, to wrap up my examination of the Caching
 service, it didn't close an InputStream that it opened.

 So, onwards - wanting to simplify this code a bit (as we don't need to
 support non-Spring IoC containers, at least not yet - and we do need to be
 able to understand and maintain this code) - I quickly noticed an issue in
 the SpringServiceManager where it returned multiple instances of the same
 service name, because it was under the mistaken belief that it needed to
 track some of the service names itself, when it was actually getting them
 returned from Spring.

 And the ServiceMixinManager was entirely redundant - as long as we can rely
 on Spring, we can rely on the container to do everything that the Mixin
 manager does, without us having to worry about the complexity of it (and it
 is both relatively complex, and a source of our problems earlier as it uses
 the FinalizableReferenceQueue). I've actually got a real bugbear about us
 using the term 'mixin'. A mixin means something specific - a class that
 provides an actual implementation, and is combined with other mixins to
 produce other classes [through multiple-inheritance]. When you look at what
 is implemented in dspace-services under the term mixin, well we don't
 actually have mixins... we have interfaces that are implemented by service
 classes. Each service class completely and wholly provide implementations of
 the methods defined in all the interfaces they implement. That's a concept
 that Java has had for a long time, and is (should be) well understood by all
 Java programmers. That doesn't make them mixins. We don't/shouldn't want
 'mixins' (multiple inheritance - there is a reason Java doesn't support it).
 And we shouldn't be using a term incorrectly just because it's cool.

 While I'm on a terminology rant, I'm not particularly happy about the
 'interceptors' (ie. RequestInterceptor). They are looking and feeling a lot
 more like listeners than interceptors - allowing other services to react to
 events taking place (start request / end request), rather than affecting
 it's behaviour. I've not renamed them, for the sake of keeping as much of

[Dspace-tech] A simplified version of dspace-services and Tomcat unloading

2010-10-04 Thread Graham Triggs
Hello all,

So, we've been debating the performance / resource usage of DSpace 1.6. This
has just happened to coincide with work that I needed to do to qualify
DSpace 1.6 (and/or 1.7) for our own use. So, for the past week or so, I've
been hammering away at profiling DSpace. This isn't a comprehensive analysis
of the entire codebase, but focusing on the serious issue of stability and
resource leakage.

This immediately led to finding two issues with the dspace-services code and
it's dependencies:

1) The resources in the FinalizableReferenceQueue were leaking (this was
noted before in a Tomcat stack trace posted here) - this could have been
avoided if a newer version of reflectutils had been used, along with the
LifeCycleManager it contains

2) A ThreadLocal in the CachingServiceImpl was also leaking - the service
did attempt to clean up in the shutdown, but it would only ever have been
able to clear up the resources of the thread executing the shutdown, not all
the other threads that would have processed requests.

Investigating the second issue, it turned out that this was being used to
keep track of Request scope caches - as well as potentially leaking
resources (across requests), this highlighted a problem in the service
contract, and a dangerous coupling between the implementation of the
services.

The Caching service was written to return a Request scoped cache without any
valid Request to bind it to - this was explicitly used in the listeners /
filters to acquire a cache and assign request objects into it before the
request service was called to 'start' a request and bind to the scoped
cache. There were also tests that expected this behaviour. Whilst the
request service will clear a request scoped cache if it's been bound to it,
you can't guarantee that the cache gets bound to a request.

Worse though, the Request service was depending on the Caching service
tracking request scoped caches against the current thread in order to be
able to track the current request. So, the one responsibility that the
request service should have, it was passing on and hoping that another
service might do the job for it without it being explicitly clear in the
contract.

If that wasn't bad enough, getCache doesn't actually properly respect the
CacheScope when trying to retrieve a particular cache (it could retrieve a
non-Request scoped cache if one of the same name had already been created,
regardless of the scope parameter). The Caching service also registered an
MBean that was never removed. And, to wrap up my examination of the Caching
service, it didn't close an InputStream that it opened.

So, onwards - wanting to simplify this code a bit (as we don't need to
support non-Spring IoC containers, at least not yet - and we do need to be
able to understand and maintain this code) - I quickly noticed an issue in
the SpringServiceManager where it returned multiple instances of the same
service name, because it was under the mistaken belief that it needed to
track some of the service names itself, when it was actually getting them
returned from Spring.

And the ServiceMixinManager was entirely redundant - as long as we can rely
on Spring, we can rely on the container to do everything that the Mixin
manager does, without us having to worry about the complexity of it (and it
is both relatively complex, and a source of our problems earlier as it uses
the FinalizableReferenceQueue). I've actually got a real bugbear about us
using the term 'mixin'. A mixin means something specific - a class that
provides an actual implementation, and is combined with other mixins to
produce other classes [through multiple-inheritance]. When you look at what
is implemented in dspace-services under the term mixin, well we don't
actually have mixins... we have interfaces that are implemented by service
classes. Each service class completely and wholly provide implementations of
the methods defined in all the interfaces they implement. That's a concept
that Java has had for a long time, and is (should be) well understood by all
Java programmers. That doesn't make them mixins. We don't/shouldn't want
'mixins' (multiple inheritance - there is a reason Java doesn't support it).
And we shouldn't be using a term incorrectly just because it's cool.

While I'm on a terminology rant, I'm not particularly happy about the
'interceptors' (ie. RequestInterceptor). They are looking and feeling a lot
more like listeners than interceptors - allowing other services to react to
events taking place (start request / end request), rather than affecting
it's behaviour. I've not renamed them, for the sake of keeping as much of
the existing API intact as I can (whilst clearing out [currently] redundant
or problematic code), but actually it seems that it should really be pushing
a synchronous and immediately executing event notification, so that the
listener registration can occur in a single place rather than duplicating
that functionality anywhere similar concepts may