Re: [Dspace-tech] A simplified version of dspace-services and Tomcat unloading
On 7 October 2010 06:55, Mark Diggory mdigg...@atmire.com wrote: You know I'm for (1) simplified spring only service manager. (2) jdbc data source delivered by servlet container (3) dropping reflectutils. Well, the JDBC datasource is a change to dspace-api rather than -services, but there is no reason why it can't go into DSpace 1.7. I'm not so sure about dropping the MBean registration completely, the whole idea was that we could have a container level service manager and services deployed across webapplications. But I assume this is just too crazy an approach to maintain and educate others on its usage. I'm just not convinced we would use a servlet container in this manner and simplification is no doubt more important... My default position is to say that if we aren't actively using something, we shouldn't have it. Whilst the registration technically works, we haven't got close to properly testing and qualifying sharing resources across contexts (/classloaders), and people will run into a lot of problems attempting to use it that way. Besides which, MBean isn't really the way to share resources in that way - JNDI is. Although MBeans are available outside a container, whereas JNDI isn't necessarily. I wouldn't class that as a good reason to use MBeans - it's a good reason to have that part of the functionality switchable - rather than a static manager that combines singleton and MBean registration as conditional logic, it should be a static facade to either a singleton or Mbean (or JNDI) implementation. That's slightly beside the point though, as in general, I agree with you - cross context sharing is a headache, a problem I don't think 'we' want to try and solve, and difficult for people to understand. If you need that kind of dynamic/component environment (and I don't think it's that big a deal to either assemble a web application combining multiple artifacts, or deploying it as a whole), then there are better ways of dealing with it (either SpringDM or EJB). My only concern is that the ability I put inplace for addons to deliver their own spring config and have the manager load it (in a SpringDM style manner) be maintained, and I see its still there... :-) It's possibly an area that could yet be simplified though (I've taken a somewhat pragmatic approach so far to not rip up any existing public APIs unless they are problematic in the current form). One question to ask is would it only need the SpringDM style approach when you would be using SpringDM (on the assumption that you are using that for a dynamic environment)? What if we put it in the dspace-services trunk, adjust the dspace trunk to use it and use this to get some community testing/feedback? If it fairs well over the next month... then we can do an official release of dspace-services just prior to the dspace 1.7.0 release candidates... Yes, that sounds like a plan. Although it passes it's unit tests (with only minor test modifications necessitated by the contract tightening), and I've used it in a DSpace 1.6.2 application without any direct modifications (related to the services at least), it does need more extensive in-application testing. G -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] A simplified version of dspace-services and Tomcat unloading
On 5 October 2010 19:17, Sands Alden Fish sa...@mit.edu wrote: What resources were leaking here exactly? Leaking resources across requests can have some serious consequences depending on the resources, and I'm curious to know exactly what type of errant behavior we could expect from a 1.6.x version running the flawed services framework. I would say that the ability to leak resource across requests is more theoretical than demonstrated in a real world scenario. By which I mean that it's clearly demonstrated that the Caching service has the ability to store data either at the wrong scope, or to have request scope caches that actually get bound / re-bound to a later, different, request - however, I haven't exercised a full DSpace application in such a way to observe this happening. But even if the application wouldn't normally run into an issue, that doesn't mean we shouldn't tighten up it's behaviour to prevent it from becoming an issue later. G -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] A simplified version of dspace-services and Tomcat unloading
Graham, this is impressive and very important work. I haven't begun to digest it all, but maybe I'll start with one question to get a better understanding... 2) A ThreadLocal in the CachingServiceImpl was also leaking - the service did attempt to clean up in the shutdown, but it would only ever have been able to clear up the resources of the thread executing the shutdown, not all the other threads that would have processed requests. Investigating the second issue, it turned out that this was being used to keep track of Request scope caches - as well as potentially leaking resources (across requests), this highlighted a problem in the service contract, and a dangerous coupling between the implementation of the services. What resources were leaking here exactly? Leaking resources across requests can have some serious consequences depending on the resources, and I'm curious to know exactly what type of errant behavior we could expect from a 1.6.x version running the flawed services framework. That's at least a starting point for me to wrap my brain around this analysis. Thanks for all the effort that went into this. P.S. - Should this thread be in dspace-devel instead of tech perhaps, or did you want to inform a broader audience? -- sands fish Senior Software Engineer MIT Libraries Technology Research Development sa...@mit.edumailto:sa...@mit.edu E25-131 On Oct 4, 2010, at 7:55 PM, Graham Triggs wrote: Hello all, So, we've been debating the performance / resource usage of DSpace 1.6. This has just happened to coincide with work that I needed to do to qualify DSpace 1.6 (and/or 1.7) for our own use. So, for the past week or so, I've been hammering away at profiling DSpace. This isn't a comprehensive analysis of the entire codebase, but focusing on the serious issue of stability and resource leakage. This immediately led to finding two issues with the dspace-services code and it's dependencies: 1) The resources in the FinalizableReferenceQueue were leaking (this was noted before in a Tomcat stack trace posted here) - this could have been avoided if a newer version of reflectutils had been used, along with the LifeCycleManager it contains 2) A ThreadLocal in the CachingServiceImpl was also leaking - the service did attempt to clean up in the shutdown, but it would only ever have been able to clear up the resources of the thread executing the shutdown, not all the other threads that would have processed requests. Investigating the second issue, it turned out that this was being used to keep track of Request scope caches - as well as potentially leaking resources (across requests), this highlighted a problem in the service contract, and a dangerous coupling between the implementation of the services. The Caching service was written to return a Request scoped cache without any valid Request to bind it to - this was explicitly used in the listeners / filters to acquire a cache and assign request objects into it before the request service was called to 'start' a request and bind to the scoped cache. There were also tests that expected this behaviour. Whilst the request service will clear a request scoped cache if it's been bound to it, you can't guarantee that the cache gets bound to a request. Worse though, the Request service was depending on the Caching service tracking request scoped caches against the current thread in order to be able to track the current request. So, the one responsibility that the request service should have, it was passing on and hoping that another service might do the job for it without it being explicitly clear in the contract. If that wasn't bad enough, getCache doesn't actually properly respect the CacheScope when trying to retrieve a particular cache (it could retrieve a non-Request scoped cache if one of the same name had already been created, regardless of the scope parameter). The Caching service also registered an MBean that was never removed. And, to wrap up my examination of the Caching service, it didn't close an InputStream that it opened. So, onwards - wanting to simplify this code a bit (as we don't need to support non-Spring IoC containers, at least not yet - and we do need to be able to understand and maintain this code) - I quickly noticed an issue in the SpringServiceManager where it returned multiple instances of the same service name, because it was under the mistaken belief that it needed to track some of the service names itself, when it was actually getting them returned from Spring. And the ServiceMixinManager was entirely redundant - as long as we can rely on Spring, we can rely on the container to do everything that the Mixin manager does, without us having to worry about the complexity of it (and it is both relatively complex, and a source of our problems earlier as it uses the FinalizableReferenceQueue). I've actually got a real bugbear about us using the term 'mixin'. A mixin means
Re: [Dspace-tech] A simplified version of dspace-services and Tomcat unloading
Thanks for the detailed writeup and explanations, it's helping me understand a lot of these issues better... looks like I still have some bedtime reading (and DSpace profiling) ahead of me! ;-) Cheers, Kim On 5 October 2010 12:55, Graham Triggs grahamtri...@gmail.com wrote: Hello all, So, we've been debating the performance / resource usage of DSpace 1.6. This has just happened to coincide with work that I needed to do to qualify DSpace 1.6 (and/or 1.7) for our own use. So, for the past week or so, I've been hammering away at profiling DSpace. This isn't a comprehensive analysis of the entire codebase, but focusing on the serious issue of stability and resource leakage. This immediately led to finding two issues with the dspace-services code and it's dependencies: 1) The resources in the FinalizableReferenceQueue were leaking (this was noted before in a Tomcat stack trace posted here) - this could have been avoided if a newer version of reflectutils had been used, along with the LifeCycleManager it contains 2) A ThreadLocal in the CachingServiceImpl was also leaking - the service did attempt to clean up in the shutdown, but it would only ever have been able to clear up the resources of the thread executing the shutdown, not all the other threads that would have processed requests. Investigating the second issue, it turned out that this was being used to keep track of Request scope caches - as well as potentially leaking resources (across requests), this highlighted a problem in the service contract, and a dangerous coupling between the implementation of the services. The Caching service was written to return a Request scoped cache without any valid Request to bind it to - this was explicitly used in the listeners / filters to acquire a cache and assign request objects into it before the request service was called to 'start' a request and bind to the scoped cache. There were also tests that expected this behaviour. Whilst the request service will clear a request scoped cache if it's been bound to it, you can't guarantee that the cache gets bound to a request. Worse though, the Request service was depending on the Caching service tracking request scoped caches against the current thread in order to be able to track the current request. So, the one responsibility that the request service should have, it was passing on and hoping that another service might do the job for it without it being explicitly clear in the contract. If that wasn't bad enough, getCache doesn't actually properly respect the CacheScope when trying to retrieve a particular cache (it could retrieve a non-Request scoped cache if one of the same name had already been created, regardless of the scope parameter). The Caching service also registered an MBean that was never removed. And, to wrap up my examination of the Caching service, it didn't close an InputStream that it opened. So, onwards - wanting to simplify this code a bit (as we don't need to support non-Spring IoC containers, at least not yet - and we do need to be able to understand and maintain this code) - I quickly noticed an issue in the SpringServiceManager where it returned multiple instances of the same service name, because it was under the mistaken belief that it needed to track some of the service names itself, when it was actually getting them returned from Spring. And the ServiceMixinManager was entirely redundant - as long as we can rely on Spring, we can rely on the container to do everything that the Mixin manager does, without us having to worry about the complexity of it (and it is both relatively complex, and a source of our problems earlier as it uses the FinalizableReferenceQueue). I've actually got a real bugbear about us using the term 'mixin'. A mixin means something specific - a class that provides an actual implementation, and is combined with other mixins to produce other classes [through multiple-inheritance]. When you look at what is implemented in dspace-services under the term mixin, well we don't actually have mixins... we have interfaces that are implemented by service classes. Each service class completely and wholly provide implementations of the methods defined in all the interfaces they implement. That's a concept that Java has had for a long time, and is (should be) well understood by all Java programmers. That doesn't make them mixins. We don't/shouldn't want 'mixins' (multiple inheritance - there is a reason Java doesn't support it). And we shouldn't be using a term incorrectly just because it's cool. While I'm on a terminology rant, I'm not particularly happy about the 'interceptors' (ie. RequestInterceptor). They are looking and feeling a lot more like listeners than interceptors - allowing other services to react to events taking place (start request / end request), rather than affecting it's behaviour. I've not renamed them, for the sake of keeping as much of
[Dspace-tech] A simplified version of dspace-services and Tomcat unloading
Hello all, So, we've been debating the performance / resource usage of DSpace 1.6. This has just happened to coincide with work that I needed to do to qualify DSpace 1.6 (and/or 1.7) for our own use. So, for the past week or so, I've been hammering away at profiling DSpace. This isn't a comprehensive analysis of the entire codebase, but focusing on the serious issue of stability and resource leakage. This immediately led to finding two issues with the dspace-services code and it's dependencies: 1) The resources in the FinalizableReferenceQueue were leaking (this was noted before in a Tomcat stack trace posted here) - this could have been avoided if a newer version of reflectutils had been used, along with the LifeCycleManager it contains 2) A ThreadLocal in the CachingServiceImpl was also leaking - the service did attempt to clean up in the shutdown, but it would only ever have been able to clear up the resources of the thread executing the shutdown, not all the other threads that would have processed requests. Investigating the second issue, it turned out that this was being used to keep track of Request scope caches - as well as potentially leaking resources (across requests), this highlighted a problem in the service contract, and a dangerous coupling between the implementation of the services. The Caching service was written to return a Request scoped cache without any valid Request to bind it to - this was explicitly used in the listeners / filters to acquire a cache and assign request objects into it before the request service was called to 'start' a request and bind to the scoped cache. There were also tests that expected this behaviour. Whilst the request service will clear a request scoped cache if it's been bound to it, you can't guarantee that the cache gets bound to a request. Worse though, the Request service was depending on the Caching service tracking request scoped caches against the current thread in order to be able to track the current request. So, the one responsibility that the request service should have, it was passing on and hoping that another service might do the job for it without it being explicitly clear in the contract. If that wasn't bad enough, getCache doesn't actually properly respect the CacheScope when trying to retrieve a particular cache (it could retrieve a non-Request scoped cache if one of the same name had already been created, regardless of the scope parameter). The Caching service also registered an MBean that was never removed. And, to wrap up my examination of the Caching service, it didn't close an InputStream that it opened. So, onwards - wanting to simplify this code a bit (as we don't need to support non-Spring IoC containers, at least not yet - and we do need to be able to understand and maintain this code) - I quickly noticed an issue in the SpringServiceManager where it returned multiple instances of the same service name, because it was under the mistaken belief that it needed to track some of the service names itself, when it was actually getting them returned from Spring. And the ServiceMixinManager was entirely redundant - as long as we can rely on Spring, we can rely on the container to do everything that the Mixin manager does, without us having to worry about the complexity of it (and it is both relatively complex, and a source of our problems earlier as it uses the FinalizableReferenceQueue). I've actually got a real bugbear about us using the term 'mixin'. A mixin means something specific - a class that provides an actual implementation, and is combined with other mixins to produce other classes [through multiple-inheritance]. When you look at what is implemented in dspace-services under the term mixin, well we don't actually have mixins... we have interfaces that are implemented by service classes. Each service class completely and wholly provide implementations of the methods defined in all the interfaces they implement. That's a concept that Java has had for a long time, and is (should be) well understood by all Java programmers. That doesn't make them mixins. We don't/shouldn't want 'mixins' (multiple inheritance - there is a reason Java doesn't support it). And we shouldn't be using a term incorrectly just because it's cool. While I'm on a terminology rant, I'm not particularly happy about the 'interceptors' (ie. RequestInterceptor). They are looking and feeling a lot more like listeners than interceptors - allowing other services to react to events taking place (start request / end request), rather than affecting it's behaviour. I've not renamed them, for the sake of keeping as much of the existing API intact as I can (whilst clearing out [currently] redundant or problematic code), but actually it seems that it should really be pushing a synchronous and immediately executing event notification, so that the listener registration can occur in a single place rather than duplicating that functionality anywhere similar concepts may