Re: [Dspace-tech] tomcat reporting memory leak?
On 5 October 2010 16:33, Simon Brown st...@cam.ac.uk wrote: Which nobody has requested, making this a massive red herring. I fail to see how cutting back on unnecessary and redundant database access constitutes overhead to cover up the problems of larger repositories. One person's unnecessary and redundant database access is another's very necessary database access - well, at least it can be. I remember the patch for reducing the updating of browse / search indexes, and I can see why it would be useful to not do those updates during a batch import if you have an appropriate workflow. That won't be the case for all of the repositories - quite a few will welcome the ability to see those items as and when they are added. There is also the issue of how long it takes to do the one very big update at the end of the batch run vs. incremental changes as you go - it may be less work overall, but having one big change can be more disruptive in some cases. Any repository, regardless of size, will see improvements with this kind of optimisation, at least one example of which I have already highlighted (and had my arguments shouted down - this is also, incidentally, why I haven't bothered to open any other JIRA tickets on other performance issues we've seen. What would be the point?) No, you didn't get shouted down for raising a performance issue. Where the argument came was because you assumed that this would clearly be of benefit to any repository, when you did nothing to address the underlying performance issues (which could have been helped quite dramatically with some small SQL tweaks and some configuration work in Postgres), and instead just bypassed them for one very specific use case. It doesn't matter how large or small a repository is, if they don't perform batch uploads using the ItemImporter, your change will do *nothing* for them. But an alteration to the underlying SQL, and guidelines for getting the best out of Postgres would benefit everyone - regardless of how large or small the repository is, or the means by which they populate it. The pertinent question for me is why, whenever the issue of performance comes up, is one of these theoretical future of repositories screeds pulled out and slammed down in front of the conversation? People are reporting problems with the systems they have *right now*. It's not meant to be a barrier to conversation, but a question as to what you want to resolve. Do you want to address the *scalability* of DSpace, or do you just want to avoid an immediate performance bottleneck? If we conflate these, conversations are going to stall, and we're not going to make any progress. Or rather, they were. And yes, it is true that there is a finite limit to what the hardware is capable of, but the quality of the software plays a significant role in how quickly that limit is reached. But we've had this conversation before. I don't really expect it to end any better this time than it did then. I completely agree - but a solution that breaks the encapsulation of the components in the system, and leaves important indexes in an inconsistent state for an extended period of time is not an automatic win for the majority of the community. I offered a lot of suggestions as to how that code could be better structured, improvements both to the SQL and the configuration of Postgres to handle the load more efficiently, and suggestions for further tweaks that would reduce the amount of updates that the code would have needed to do still further. All of which would have be more beneficial to the community (not just improving batch uploads, but interactive / singular deposits and edits) - and not only that, would have improved the performance of your systems further than you had so far achieved. Any method of increasing the processing capabilities of a system, either through more powerful hardware or improvements in the software, is postponing the inevitable for any repository with continued growth. The difference is in how much cost there is to any individual repository in each of those methods. Our system, with the changes we've made to it, struggles at around 300,000 items. People are reporting problems (presumably running stock 1.6.2) at around 50,000, from what I can gather. This is where we need to be careful about what we are reporting. Quite a few of the issues around 1.6.x appear to be around rampant memory usage, rather than a clear function of how many records there are in the database. There are also different issues involved if we are talking about adding / editing lots of records, or simply highly accessed. Even so, regardless of what we do to the code to make it efficient, it does not and can not absolve the system administrator of correctly maintaining both DSpace itself, and it's dependencies. I wouldn't want to get drawn on where that point is without any evidence, but there is a lot of scope for altering and improving Postgres
Re: [Dspace-tech] tomcat reporting memory leak?
On 6 Oct 2010, at 15:15, Graham Triggs wrote: [snip] This is exactly the kind of pointless pontification that we got last time. Any point that is raised is deflected or ignored, and you even manage to contradict yourself between paragraphs. What's it to be, should patches benefit ALL repositories, or is it fine if it's just some? Or the other way round, maybe? I will be very happy to offer our experiences regarding large-scale DSpace instances with the community, if that can be of any help. But not if it involves having to deal with Graham Triggs. I really do not have time for this. -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
All, I would really appreciate it if we could stop the negativity in this discussion thread. I'm sorry to have to post a message of this sort publicly, but I feel I'm unfortunately being forced to do so. Insults and negativity on a public listserv do not help anyone. I also personally take offense to the insulting of anyone in our DSpace Committers group, as they are volunteering their own time (sometimes even outside of their workplace) to make DSpace software better. Open source software does not build and maintain itself, and our group of Committers have made it their passion to improve DSpace for the benefit of us all. Despite any arguments or differences we all may have, it is in our best interest to work together to resolve these issues in a friendly and timely manner. There is a place for arguments and disagreements on these DSpace mailing lists and I welcome them, provided they are kept constructive. I'm in touch with Cambridge around their performance issues off-list, and hope that we can work towards a solution to these issues for everyone involved. Thanks, Tim Donohue Technical Lead for DSpace Project DuraSpace.org -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 4 Oct 2010, at 15:00, Graham Triggs wrote: On 29 September 2010 14:17, Tom De Mulder td...@cam.ac.uk wrote: I know you like to talk down the problem, but that really isn't helping. This isn't about talking down the problem - it's about finding where the real problems are and not just patching the immediate concerns. And considering the interests of nearly 1000 DSpace instances that are registered on dspace.org - many of whom will probably be more worried about rampant resource usage for small repositories from adding overhead to cover up the problems of larger repositories. Which nobody has requested, making this a massive red herring. I fail to see how cutting back on unnecessary and redundant database access constitutes overhead to cover up the problems of larger repositories. Any repository, regardless of size, will see improvements with this kind of optimisation, at least one example of which I have already highlighted (and had my arguments shouted down - this is also, incidentally, why I haven't bothered to open any other JIRA tickets on other performance issues we've seen. What would be the point?) We run 5 DSpace instances, three of these are systems with hundreds of thousands of items, and it's dog slow and immensely resource- intensive. And yes, we want these to be single systems. Why shouldn't we? Surely the more pertinent question is why wouldn't you want to be able to run a multi-node solution? I'm sure I don't need to tell you that no matter how good a job you do of making the system perform better with larger datasets, there will always be a finite limit to how large the repository can be, how many users you can service, and how quickly it will process requests for any given hardware allocation. The pertinent question for me is why, whenever the issue of performance comes up, is one of these theoretical future of repositories screeds pulled out and slammed down in front of the conversation? People are reporting problems with the systems they have *right now*. Or rather, they were. And yes, it is true that there is a finite limit to what the hardware is capable of, but the quality of the software plays a significant role in how quickly that limit is reached. But we've had this conversation before. I don't really expect it to end any better this time than it did then. Yes, DSpace can do a better job than it currently does, but it's just postponing the inevitable. How much in technology relies on just making things bigger/faster? Even our single system hardware is generally made of multiple identical components - CPUs with multiple cores, memory consisting of multiple 'sticks', each consisting of multiple storage chips, storage combining multiple hard drives each having multiple platters. Any method of increasing the processing capabilities of a system, either through more powerful hardware or improvements in the software, is postponing the inevitable for any repository with continued growth. The difference is in how much cost there is to any individual repository in each of those methods. Our system, with the changes we've made to it, struggles at around 300,000 items. People are reporting problems (presumably running stock 1.6.2) at around 50,000, from what I can gather. That means that the optimum size for a single repository running unmodified 1.6.2 is less than 50,000 items, or more than six separate DSpace instances for the number of items we hold. That's at least a sixfold increase in hardware and operational costs. Even in a situation where higher education funding had not just been significantly cut, that amount of money would be rather difficult to come by. In a situation where people are able to point to significantly better performance from other systems on similar hardware, it would become substantially more difficult. And much of our dependencies are going the same way - Oracle database clusters, Solr is designed to get scalability from running over multiple shards, even Postgres has taken a major step towards clustering / replication with it's 9.0 release. Either way, you will always hit a hard limit with keeping things on a single system - so at some point, something has to give, whether it's separating out DSpace application, Solr and Postgres instances to separate machines, or accepting this reality in the repository and building it to scale across multiple nodes itself. This in turn would bring benefits to how easily you can scale (in theory, a lot easier to scale at the repository level than scaling each of it's individual components), as well as potentially better preservation and federation capabilities. Leaving aside any theoretical ideal futures for the moment, it seems to me that the gist of this conversation is DSpace does not support single-instance repositories over a certain size. That
Re: [Dspace-tech] tomcat reporting memory leak?
Hi Simon All, On 10/5/2010 10:33 AM, Simon Brown wrote: On 4 Oct 2010, at 15:00, Graham Triggs wrote: On 29 September 2010 14:17, Tom De Muldertd...@cam.ac.uk wrote: I know you like to talk down the problem, but that really isn't helping. This isn't about talking down the problem - it's about finding where the real problems are and not just patching the immediate concerns. And considering the interests of nearly 1000 DSpace instances that are registered on dspace.org - many of whom will probably be more worried about rampant resource usage for small repositories from adding overhead to cover up the problems of larger repositories. Which nobody has requested, making this a massive red herring. I fail to see how cutting back on unnecessary and redundant database access constitutes overhead to cover up the problems of larger repositories. Any repository, regardless of size, will see improvements with this kind of optimisation, at least one example of which I have already highlighted (and had my arguments shouted down - this is also, incidentally, why I haven't bothered to open any other JIRA tickets on other performance issues we've seen. What would be the point?) It's really unfortunate that you've experienced this and/or felt this way in the past. Perhaps we haven't been able to tease out the problems at hand as well as we could have, and I hope we can improve upon that now. However, I'd highly recommend freely adding specific issues to our JIRA -- it will *guarantee* that the DSpace committers will review discuss them (each week, we set aside time in our weekly meeting to do so -- see https://wiki.duraspace.org/display/DSPACE/Developer+Meetings ). When adding JIRA issues, specifics are best, that way we can narrow down where the problem may reside. The longer these specific issues remain outside of JIRA, the more likely they will be accidentally overlooked in future versions of DSpace (as JIRA is our primary means of scheduling things to be fixed in new versions). We really do mean well, and we'd like to work with you to resolve these issues. We're not trying to continually throw up red herrings to avoid problems -- it's really a matter of attempting to better understand where the specific issue resides. As volunteer developers, each of the DSpace Committers all only have a limited amount of time to work on DSpace in a given week. Therefore, the more information you can provide us with, the better. If you know of specific areas where there are redundant database accesses, we'd appreciate it if you could point them out to us (or enter a JIRA issue and we'll fix it). We want to resolve these issues, but sometimes we don't have enough time in our normal work week to dig in deep enough to locate them. We highly encourage sites who have stumbled across problems in the code to report them -- that way we can look at that specific area of the code and fix it so that it is no longer an issue. Leaving aside any theoretical ideal futures for the moment, it seems to me that the gist of this conversation is DSpace does not support single-instance repositories over a certain size. That being the case, I think it would be only fair to make that lack of support explicit in the documentation and PR materials for the software, in order that all of the relevant information is readily available for anyone making decisions about the future of their repository. I'd say we want to support single-instance repositories of larger sizes as well. There will always be a size limit where it makes more sense to scale across multiple nodes, but we should be working to increase that size limit as much as we can (within reason, obviously). Although it isn't yet explicit in our RoadMap, I think we also want to work towards allowing DSpace to scale across multiple nodes (where it makes sense to). Again, the best way for us to improve your immediate DSpace performance is to better understand the exact problems you've already noticed. We can only fix issues that we know about, and sometimes discovering where the issue resides can be the hardest part. If you've already discovered very specific issue(s), we'd appreciate it if you can share them. If you haven't yet discovered the exact issue(s), we may be able to help narrow down the problem if you can share which parts of your DSpace seem 'especially sluggish', etc. The end result is that we really should be working together on a resolution for the present, rather than continually arguing over ideal futures or past discussions. Open source development works best if we can all share information/ideas/issues/resolutions freely and openly. Yes, that also means sometimes arguing openly -- which is perfectly OK by me, as sometimes arguments bring us all to a better solution or route forward. But, I do want to encourage us all to keep things constructive, so that we can move DSpace software forward to the
Re: [Dspace-tech] tomcat reporting memory leak?
On 29 September 2010 14:17, Tom De Mulder td...@cam.ac.uk wrote: I know you like to talk down the problem, but that really isn't helping. This isn't about talking down the problem - it's about finding where the real problems are and not just patching the immediate concerns. And considering the interests of nearly 1000 DSpace instances that are registered on dspace.org - many of whom will probably be more worried about rampant resource usage for small repositories from adding overhead to cover up the problems of larger repositories. We run 5 DSpace instances, three of these are systems with hundreds of thousands of items, and it's dog slow and immensely resource-intensive. And yes, we want these to be single systems. Why shouldn't we? Surely the more pertinent question is why wouldn't you want to be able to run a multi-node solution? I'm sure I don't need to tell you that no matter how good a job you do of making the system perform better with larger datasets, there will always be a finite limit to how large the repository can be, how many users you can service, and how quickly it will process requests for any given hardware allocation. Yes, DSpace can do a better job than it currently does, but it's just postponing the inevitable. How much in technology relies on just making things bigger/faster? Even our single system hardware is generally made of multiple identical components - CPUs with multiple cores, memory consisting of multiple 'sticks', each consisting of multiple storage chips, storage combining multiple hard drives each having multiple platters. And much of our dependencies are going the same way - Oracle database clusters, Solr is designed to get scalability from running over multiple shards, even Postgres has taken a major step towards clustering / replication with it's 9.0 release. Either way, you will always hit a hard limit with keeping things on a single system - so at some point, something has to give, whether it's separating out DSpace application, Solr and Postgres instances to separate machines, or accepting this reality in the repository and building it to scale across multiple nodes itself. This in turn would bring benefits to how easily you can scale (in theory, a lot easier to scale at the repository level than scaling each of it's individual components), as well as potentially better preservation and federation capabilities. G -- Virtualization is moving to the mainstream and overtaking non-virtualized environment for deploying applications. Does it make network security easier or more difficult to achieve? Read this whitepaper to separate the two and get a better understanding. http://p.sf.net/sfu/hp-phase2-d2d___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
Hi, first, I want to thank Mark Wood for recommending LambdaProbe, it is proving a very useful tool. I can see already that we need to increase our PermGen, and will probably borrow Mark's JAVA_OPTS settings for our production and development Tomcat instances. In trying to further educate myself about these issues, I came across this excellent page on the Tomcat wiki, which at the end includes debugging/troubleshooting advice that is very close to the procedure Graham Triggs outlined at a recent committer's meeting. I'm forwarding this link to the list, as I think it might prove useful to others: http://wiki.apache.org/tomcat/OutOfMemory --Hardy -Original Message- From: Mark H. Wood [mailto:mw...@iupui.edu] Sent: Wednesday, September 29, 2010 12:08 PM To: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] tomcat reporting memory leak? I'd like to point out that the discussion is broadening considerably: a system can be slow for many reasons, not just memory starvation. Step 1: what resource(s) are you short of? Something like LambdaProbe can peek inside Tomcat and show you how much of each of the various memory pools is being used. OS tools can show whether you are swapping heavily or spending a lot of time in I/O wait or are really CPU-bound (and what, besides Tomcat, may be eating CPU). DBMS tools can reveal places in the schema that don't scale well, queries that could be optimized, and additional indices that would be beneficial. It would be really helpful for large, busy sites with performance problems to share any such detailed observations. Some of those problems can probably be tuned away, and some will point to specific things for coders to investigate. Scaling experience will be valuable both in documenting good ways to tune up for DSpace and in finding design hotspots for rework. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Balance your desire for bells and whistles with the reality that only a little more than 2 percent of world population has broadband. -- Ledford and Tyler, _Google Analytics 2.0_ -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 24 Sep 2010, at 21:17, bill.ander...@library.gatech.edu wrote: We've been experiencing problems similar to some reported on this thread since our upgrade to 1.6 several months ago. We're still using the jspui, and we've wondered (among other things) if some of these problems might be alleviated by a switch to the xmlui. Has anybody had any experience comparing the memory footprint and/or resource usage issues between the two interfaces? We load-tested the XMLUI (on identical hardware) and it was even worse. It ran out of memory and crashed really quickly, so we never took it into production. But your mileage may vary. Best regards, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 29 Sep 2010, at 11:38, Hilton Gibson wrote: We started with a VM which had 2GB memory. Then added 2GB to the VM, no luck. Then luckily we had funds to buy a server. So now we have 12GB RAM and 12CPU's. No crashes so far. Using the XMLUI. Does DSpace really need this and what happens when we go to one million items ?? A lot of the back-end code of DSpace, the very core of it, is inherently inefficient. Several tasks are executed more than once, and entire objects are created when only one attribute is needed, etc. (I'd be more specific, but I'm not a specialist on this matter, and our resident DSpace developer is on leave this week.) I am really glad to hear from other people with problems similar to ours. -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 29 Sep 2010, at 11:47, Mark Ehle wrote: Why was tomcat chosen as a platform for DSpace? It wasn't. You can use any Servlet engine. We used JBoss for a while but went back to Tomcat because it fitted into our infrastructure better. I believe DSpace was written in Java because Rob Tansley wanted to try writing a project in Java, but I could be wrong. :) Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 29 September 2010 11:38, Hilton Gibson hilton.gib...@gmail.com wrote: Using the XMLUI. Does DSpace really need this and what happens when we go to one million items ?? Does DSpace really need that? No. As I have said, I'm running 30 separate repositories - using JSPUI (circa 1.4.2 / 1.5 codebase) - all on a single server / Tomcat instance. Some of those repositories have 1000s of items, and get quite decent levels of access. The server has 8GB installed, 3GB heap turned over to Tomcat (plus 1GB for non-heap). The Tomcat instance has 2GB of *free* heap space, rarely runs above 5% cpu usage, and has plenty of capacity to run more repositories (the rate at which files are opened/closed is actually a bigger issue for Tomcat startup). Although, it's worth pointing out that the database is hosted on a separate server - I can't say how many resources that is really using, as it's shared with other services, but it is apparently 'tiny'. What happens at one million items? Well, that's an interesting issue. But is it really the right question to be asking? How far do you want/need to be able to scale a 'monolithic' instance, before you spread it over multiple servers? As long as you can spread it over multiple servers, it gives you a much higher ceiling than relying on a single box - and it is easier to scale for increasing size/usage by adding more boxes (you don't have to migrate). If you focus on scaling a single installation, then you end up increasing the overall requirements (ie. memory for caching), and make it harder to have scaling over multiple boxes at all. G -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 29 September 2010 11:48, Tom De Mulder td...@cam.ac.uk wrote: A lot of the back-end code of DSpace, the very core of it, is inherently inefficient I don't entirely disagree with that statement - there are some things that can definitely be improved, particularly where you have to deal with more items in a single instance. But take a look at my numbers - at it's core, it really isn't that bad for the vast majority of DSpace users (how many have more than even 50,000 items currently)? And some of it depends on correct system setup (Postgres version/options, etc.) It's adding xmlui, solr, etc. that is putting a lot more demands on the system. G -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
That begs the question as do you think something else should be chosen / recommended? There really isn't anything preventing you using Jetty, etc. but Tomcat is actually a pretty solid server that does a lot of things quite well - and particularly in recent versions in being defensive against bad application behaviour. And when you look at the grand scheme of things, the smaller footprint of Jetty doesn't really make a whole lot of difference. G On 29 September 2010 11:47, Mark Ehle marke...@gmail.com wrote: Why was tomcat chosen as a platform for DSpace? -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
We're comfortably running *three* production DSpace instances in a single Tomcat 6 with these limits: JAVA_OPTS=-Xmx1024M -Xms768M JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=128M JAVA_OPTS=$JAVA_OPTS -XX:PermSize=32M That's on a box with 3GB of physical memory. One DSpace instance is 1.6, and the other two are 1.5. Now, I do have an old weekly reminder to check PermGen on that box, but it is always around half filled these days. We had problems in the past, but newer versions of DSpace seem to do much better in that regard. I can't recall the last time we had to restart that Tomcat just to clean up memory. We have a development box with maybe two dozen DSpace instances, none of them very busy at all, various versions and states of disrepair, and we do have to restart Tomcat there from time to time if we are doing a lot of webapp. reloading. The limits there are: JAVA_OPTS=-Xmx1024M -Xms128M JAVA_OPTS=$JAVA_OPTS -XX:PermSize=192M JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=384M on a 4GB machine. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Balance your desire for bells and whistles with the reality that only a little more than 2 percent of world population has broadband. -- Ledford and Tyler, _Google Analytics 2.0_ pgpDu2gzdSXuS.pgp Description: PGP signature -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On Wed, Sep 29, 2010 at 11:48:02AM +0100, Tom De Mulder wrote: A lot of the back-end code of DSpace, the very core of it, is inherently inefficient. Several tasks are executed more than once, and entire objects are created when only one attribute is needed, etc. (I'd be more specific, but I'm not a specialist on this matter, and our resident DSpace developer is on leave this week.) When your developer has time, I think that specific JIRA tickets on these observations would be appreciated. We need all the eyes we can borrow. It needn't be a rigorous analysis (though that would be wonderful). Significant inefficiencies noted in passing are important information. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Balance your desire for bells and whistles with the reality that only a little more than 2 percent of world population has broadband. -- Ledford and Tyler, _Google Analytics 2.0_ pgp5elq6QWOXZ.pgp Description: PGP signature -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 29 Sep 2010, at 13:03, Graham Triggs wrote: Some of those repositories have 1000s of items, and get quite decent levels of access. Thousands? I don't even want to have this discussion until you're talking hundreds of thousands, and how many hits per second. I know you like to talk down the problem, but that really isn't helping. We run 5 DSpace instances, three of these are systems with hundreds of thousands of items, and it's dog slow and immensely resource-intensive. And yes, we want these to be single systems. Why shouldn't we? We have other systems here at the University that are much bigger, do similar things and require far, far less in terms of resources. -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
Hi all, Interesting thread so far and keep up the good discussion. I think it'd be helpful to us all if we could all share more information about our DSpace setups (similar to Mark Wood's tip on his local JAVA_OPTS settings). The more we know about your DSpace/Java/Tomcat/Postgres (or Oracle) configurations, server setups, etc. the better chance we have at helping you out. There may be some immediate performance improvements you can achieve just by tweaking your setup/configurations slightly. I had setup a basic template for this on the Wiki at https://wiki.duraspace.org/display/DSPACE/ScalabilityIssues1.6 But, feel free to just send info along in any format you wish. The template was mostly there to give everyone an idea of what type of information can be useful to us (so that we can hopefully provide you with some helpful suggestions and find longer term fixes). Obviously, we also want to track down and fix any memory leaks or larger problems as well. So if you've already discovered specific issues, let us know about those as well, so we can add them to our Issue Tracker (http://jira.dspace.org/) and schedule them to be resolved. Thanks, Tim Donohue Technical Lead for DSpace Project DuraSpace.org On 9/29/2010 7:59 AM, Mark H. Wood wrote: We're comfortably running *three* production DSpace instances in a single Tomcat 6 with these limits: JAVA_OPTS=-Xmx1024M -Xms768M JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=128M JAVA_OPTS=$JAVA_OPTS -XX:PermSize=32M That's on a box with 3GB of physical memory. One DSpace instance is 1.6, and the other two are 1.5. Now, I do have an old weekly reminder to check PermGen on that box, but it is always around half filled these days. We had problems in the past, but newer versions of DSpace seem to do much better in that regard. I can't recall the last time we had to restart that Tomcat just to clean up memory. We have a development box with maybe two dozen DSpace instances, none of them very busy at all, various versions and states of disrepair, and we do have to restart Tomcat there from time to time if we are doing a lot of webapp. reloading. The limits there are: JAVA_OPTS=-Xmx1024M -Xms128M JAVA_OPTS=$JAVA_OPTS -XX:PermSize=192M JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=384M on a 4GB machine. -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
Thanks - I was just curious. On Wed, Sep 29, 2010 at 6:53 AM, Tom De Mulder td...@cam.ac.uk wrote: On 29 Sep 2010, at 11:47, Mark Ehle wrote: Why was tomcat chosen as a platform for DSpace? It wasn't. You can use any Servlet engine. We used JBoss for a while but went back to Tomcat because it fitted into our infrastructure better. I believe DSpace was written in Java because Rob Tansley wanted to try writing a project in Java, but I could be wrong. :) Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
Quick followup, in case it isn't clear (as I was asked about this off-list). The preference would be to share your DSpace setup/configuration information directly on this listserv (or you can post up on the wiki if you prefer). That way we can get more eyes on it, and hopefully come up with better suggestions. Also, this may be an area where sharing this information can help us to document some best practices, based on recommended setups and performance hints/tips that people have. So, I'm hoping that as this thread continues, we can pull out the main tips/hints and document them for future reference. At the same time, we can pull out the common memory/performance issues so that they can be investigated further, and hopefully resolved as soon as possible. Committers -- it'd also be great if you can take a few moments to send your basic setup info Dspace size to the listserv (especially noting anything that you may have tweaked above beyond the normal DSpace install docs, like JAVA_OPTS or similar settings). This can hopefully encourage others to do the same. - Tim On 9/29/2010 9:46 AM, Tim Donohue wrote: Hi all, Interesting thread so far and keep up the good discussion. I think it'd be helpful to us all if we could all share more information about our DSpace setups (similar to Mark Wood's tip on his local JAVA_OPTS settings). The more we know about your DSpace/Java/Tomcat/Postgres (or Oracle) configurations, server setups, etc. the better chance we have at helping you out. There may be some immediate performance improvements you can achieve just by tweaking your setup/configurations slightly. I had setup a basic template for this on the Wiki at https://wiki.duraspace.org/display/DSPACE/ScalabilityIssues1.6 But, feel free to just send info along in any format you wish. The template was mostly there to give everyone an idea of what type of information can be useful to us (so that we can hopefully provide you with some helpful suggestions and find longer term fixes). Obviously, we also want to track down and fix any memory leaks or larger problems as well. So if you've already discovered specific issues, let us know about those as well, so we can add them to our Issue Tracker (http://jira.dspace.org/) and schedule them to be resolved. Thanks, Tim Donohue Technical Lead for DSpace Project DuraSpace.org On 9/29/2010 7:59 AM, Mark H. Wood wrote: We're comfortably running *three* production DSpace instances in a single Tomcat 6 with these limits: JAVA_OPTS=-Xmx1024M -Xms768M JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=128M JAVA_OPTS=$JAVA_OPTS -XX:PermSize=32M That's on a box with 3GB of physical memory. One DSpace instance is 1.6, and the other two are 1.5. Now, I do have an old weekly reminder to check PermGen on that box, but it is always around half filled these days. We had problems in the past, but newer versions of DSpace seem to do much better in that regard. I can't recall the last time we had to restart that Tomcat just to clean up memory. We have a development box with maybe two dozen DSpace instances, none of them very busy at all, various versions and states of disrepair, and we do have to restart Tomcat there from time to time if we are doing a lot of webapp. reloading. The limits there are: JAVA_OPTS=-Xmx1024M -Xms128M JAVA_OPTS=$JAVA_OPTS -XX:PermSize=192M JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=384M on a 4GB machine. -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
- Tim Donohue tdono...@duraspace.org wrote: | Quick followup, in case it isn't clear (as I was asked about this | off-list). The preference would be to share your DSpace | setup/configuration information directly on this listserv Let me kick things off, then (questions truncated a bit for formatting reasons): 1) Contact Info a) Bill Anderson / Georgia Institute of Technology / bill.ander...@library.gatech.edu 2) DSpace Setup and Configuration details a) What DSpace version are you using? 1. Dspace 1.6.2 2. Currently using JSPUI, migrating to XMLUI 3. 30,498 Items 4. 610 Communities/Collections b) What Postgres/Oracle version are you using? 1. PostgreSQL 8.1.4 c) What Tomcat version are you using? 1. Tomcat/6.0.26 + mod_jk/1.2.30 + Apache/2.0.52 d) Is everything running on one server (DSpace/Tomcat/Posgres/etc)? 1. Everything is (currently) on the same server 2. PowerEdge 2850: 2x Intel Xeon CPU 2.80Ghz, 12Gb Memory, Red Hat AS 4 (Nahant Update 8), RAID5 Disk array e) How much memory are you making available to Tomcat/Java? 1. (lb worker) JAVA_OPTS=-server -Xmx462M -Xms462M -XX:+UseParallelGC -Dfile.encoding=UTF-8, webapps: jspui lni oai sword xmlui 2. (lb worker) JAVA_OPTS=-server -Xmx462M -Xms462M -XX:+UseParallelGC -Dfile.encoding=UTF-8, webapps: jspui lni oai sword xmlui 3. JAVA_OPTS=-server -Xmx600M -Xms600M -XX:+UseParallelGC -Dfile.encoding=UTF-8, webapps: solr 4. lb worker method=request, socket_keepalive=True, socket_timeout=0, ping_mode=A 5. Postgres max_connections=300 3) Performance / Scalability Issues noticed 1. We've had intermittent performance problems since upgrading to 1.6 in May. At first, the problems seemed strictly SOLR-related; SOLR was grabbing hundreds of postgres connections, and eventually generating these in dspace.log: org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error: Timeout waiting for idle object and these in catalina.out: SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. ...followed by permgen errors and death. 2. We heavily revised our solrconfig.xml, and alleviated the problem, but didn't eliminate it. We also split our jspui between two load-balanced tomcat instance, and moved the SOLR webapp to another third instance, which also helped. Following OR 2010, on a suggesting from Peter Dietz, we revised the SOLR JSP code to use the auto-commit functionality rather than manually committing every transaction. All of this got us to the point where we weren't crashing routinely; but we still have major problems during times of heavy traffic. Generally, these take the form of a gradual slowdown followed by a complete failure to respond; this sometimes ends in spontaneous recovery, and sometimes in permgen errors and a crash. At the end of last week, following a bad patch caused by a LOCKSS harvest, we implemented a restart schedule, with our two jspui tomcat instances being automatically restarted every 6 hours alternating between one/two. We haven't had any crashes since; but we're not at all sure we've solved the problem. 3. On restart, we sometimes get a bunch of these: Sep 28, 2010 9:00:06 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads SEVERE: A web application appears to have started a thread named [FinalizableReferenceQueue] but has failed to stop it. This is very likely to create a memory leak 4. Other errors that lead to a service/application outage: Sep 23, 2010 3:47:14 PM org.apache.tomcat.util.threads.ThreadPool$ControlRunnable run SEVERE: Caught exception (java.lang.OutOfMemoryError: PermGen space) executing org.apache.jk.common.channelsocket$socketconnect...@3aff776, terminating thread Sep 23, 2010 10:37:04 AM org.apache.catalina.connector.CoyoteAdapter service SEVERE: An exception or error occurred in the container during the request processing java.lang.OutOfMemoryError: PermGen space at java.lang.Throwable.getStackTraceElement(Native Method) at java.lang.Throwable.getOurStackTrace(Throwable.java:591) at java.lang.Throwable.getStackTrace(Throwable.java:582) at org.apache.juli.logging.DirectJDKLog.log(DirectJDKLog.java:155) at org.apache.juli.logging.DirectJDKLog.error(DirectJDKLog.java:135) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:274) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at
Re: [Dspace-tech] tomcat reporting memory leak?
I'd like to point out that the discussion is broadening considerably: a system can be slow for many reasons, not just memory starvation. Step 1: what resource(s) are you short of? Something like LambdaProbe can peek inside Tomcat and show you how much of each of the various memory pools is being used. OS tools can show whether you are swapping heavily or spending a lot of time in I/O wait or are really CPU-bound (and what, besides Tomcat, may be eating CPU). DBMS tools can reveal places in the schema that don't scale well, queries that could be optimized, and additional indices that would be beneficial. It would be really helpful for large, busy sites with performance problems to share any such detailed observations. Some of those problems can probably be tuned away, and some will point to specific things for coders to investigate. Scaling experience will be valuable both in documenting good ways to tune up for DSpace and in finding design hotspots for rework. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Balance your desire for bells and whistles with the reality that only a little more than 2 percent of world population has broadband. -- Ledford and Tyler, _Google Analytics 2.0_ pgppBjWncgb5p.pgp Description: PGP signature -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 22 Sep 2010, at 20:22, Sands Alden Fish wrote: (2) We currently don't have a centralized server with enough test data to run many of these memory or scalability tests on our own. I think this is something we could look into improving upon (especially if anyone has test data to donate to the cause). There is a lot of public domain data available online. I spent some time collecting some of this in a variety of formats (text, images, movies, sound, datasets) and then wrote something to use a word list (e.g. /usr/share/dict on most Linux systems) to create random metadata for them. After all, it doesn't matter that many bitstreams will be identical. That is how we populated our test environment here so we could replicate the problems we were seeing on the live system. Best regards, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
Hello Graham, this is an important point. Apart from the issues mentioned, a simpler architecture will help DSpace adopt to new requirements/technology changes and stay flexible and easy to manage. Furthermore too much clever tricks under the hood will raise the risk that with change in the committer team (people do change jobs, or just priorities change and so the commitment) important knowledge will not be available anymore and has to be regained at some cost. Maybe we need the old arch board back or something similar. Best put this on the list for the committer and all meetings or a special meeting. Needs a bit more space to talk about. Have a sunny day Claudia Am 21.09.2010 13:52, schrieb Graham Triggs: ... I have repeatedly warned about the consequences of overly-complicated code and using 'clever tricks' under the hood. A lot of what I've mentioned above *can* be replaced with a much simpler architecture, that's much easier to understand, easier to maintain, and does not have the same problems. If this matters to you, then it's going to take more than just me to stand up and say this. G -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Claudia Juergen Universitaetsbibliothek Dortmund Eldorado 0231/755-4043 https://eldorado.tu-dortmund.de/ -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
Hi Graham, I don't have time at the moment to consider some of the bigger issues you raise but I would like to echo Hardy's comments. Historically many Dspace installations have had little content and been lightly used. I think this has allowed us to develop without much consideration for performance. I would like to see the sort of testing you have done becoming part of our procedures prior to release, rather than being left to the bigger sites, such as BioMed, to sort out after the event. Cheers, Robin. Robin Taylor Main Library University of Edinburgh Tel. 0131 6513808 -Original Message- From: Pottinger, Hardy J. [mailto:pottinge...@umsystem.edu] Sent: 21 September 2010 18:27 To: Graham Triggs; Tom De Mulder Cc: dspace-tech@lists.sourceforge.net; Damian Marinaccio Subject: Re: [Dspace-tech] tomcat reporting memory leak? Hi, Graham, for what it's worth, I'll stand with you. :-) I think addressing the issues you've discovered is really important. Here's an idea: how about some new unit and/or performance tests that check if a class and/or app is unloading cleanly? In other words, would it be possible to express the tests you have in such a way that they could be part of the new testing framework? Are there JIRA issues, and/or patches for what you have already found/fixed? --Hardy -Original Message- From: Graham Triggs [mailto:grahamtri...@gmail.com] Sent: Tuesday, September 21, 2010 6:52 AM To: Tom De Mulder Cc: dspace-tech@lists.sourceforge.net; Damian Marinaccio Subject: Re: [Dspace-tech] tomcat reporting memory leak? On 20 September 2010 15:59, Tom De Mulder td...@cam.ac.uk wrote: On Mon, 20 Sep 2010, Damian Marinaccio wrote: I'm seeing the following log messages in catalina.out: [...] SEVERE: The web application [] appears to have started a thread named [FinalizableReferenceQueue] but has failed to stop it. This is very likely to create a memory leak. There are quite a few memory leaks in DSpace. We have a cronjob to restart Tomcat nightly, because otherwise it'll break the next day. Hi all, Oh, welcome to my world!! I'm going to start off by pointing out that the majority of DSpace code is actually quite well behaved. Going back to the codebase circa 1.4.2 / 1.5, and using the JSP user interface - I've got *thirty* spearate DSpace repositories / applications running in a single Tomcat instance, which has operated without a restart in over 90 days. And whilst be able to undeploy and redeploy any of those applications at will - or just reload them so that they pick up new configuration. That does require a bit of careful setup / teardown in the context listeners (that wasn't always part of the DSpace code), and you need to get certain JARs - particularly the database/pooling drivers - out of the web applications entirely and into the shared level of Tomcat. Most of that is actually just good / recommended practise for systems administration of a Java application server anyway. I was careful to point out that I have achieved that with pre-1.6 code and JSP only. Both 1.6 and XML ui (of any age) change the landscape. XML ui has always taken a large chunk of resources, although whilst it was still based on Cocoon 2.1, I managed to at least clean up it's startup / shutdown behaviour by repairing it's logging handler. This behaviour has changed with Cocoon 2.2, and I'll come back to that shortly. So, 1.6 - I've been doing some work on the resource usage and clean loading/unloading of both JSP and XML using 1.6.2 recently, and neither are clean out of the box. The first issue you run into is the FinalizableReferenceQueue noted in the stack trace above. This is coming from a reference map in reflectutils - and was found to be a cleanup problem in course of DSpace 2 development (the kernel / services framework was backported from that work). I added a LifecycleManager to reflectutils that was released as version 0.9.11 that allows the internal structures to be shutdown cleanly, and implemented this as part of DSpace 2, however this appears to have been ignored in the backport. So, with the reflectutils/Lifecycle changes, and careful placement of JARs, etc. I did get the JSP ui to unload cleanly last week. I would note that I didn't stress the application too heavily, so there may be some operations that might trigger different code paths that are still a problem, but at the baseline it was working correctly. XML ui has proven to be a somewhat more challenging beast. I first ran into two problems that are inside Cocoon 2.2 itself - 1) in the sitemap processing, it's using a stack inside a ThreadLocal, but it never removes the stack when it empties it, and 2) in one class
Re: [Dspace-tech] tomcat reporting memory leak?
I am very happy to see that this issue seems finally to be taken seriously. However, I find myself getting a bit frustrated that it was never taken seriously when I raised it in the past. I think the DSpace source code carries with it a lot of historical baggage, and it could do with being addressed even without making fundamental changes to the basic architecture. Although my personal favourite would be a completely new architecture with more loosely coupled modules, but fixing memory leaks and the associated slow performance would be a good start. I can add that, for example, deleting a collection with 1200 items on our rather powerful DSpace machines will take two hours, and uses most of the available memory. You can see why I would like that no longer to be the case. Best regards, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
Hi all, I'm sorry if any of you have felt that this issue is not being taken seriously in the past. The reality of the situation is that we (the DSpace Developers/Committers) currently depend on feedback/testing from larger DSpace instances around these sorts of scalability and memory issues. As DSpace is Community Built Supported Software, there are a couple things to keep in mind: (1) DSpace software has zero full-time developers. All Committers are volunteers and can only devote as much time as their individual institutions allow. Although I officially have DSpace in my title, I also wear several hats in DuraSpace. Therefore, even I don't have much time in a given week to devote towards actual DSpace development work. (2) We currently don't have a centralized server with enough test data to run many of these memory or scalability tests on our own. I think this is something we could look into improving upon (especially if anyone has test data to donate to the cause). I agree with Robin T. that it is in everyone's interest to improve our performance testing prior to each release. I'd also encourage Graham (and others) to share their testing routes so that we can work to make this happen, and start to locate these performance issues *before* new releases, rather than after. I'm also very happy to see these issues starting to gain some leverage. The reality of the situation is that we need one or more volunteers to step up and help to make these improvements or suggest testing routes that can allow us to better investigate where memory leaks may be occurring (or point them out if you've already found where the leaks are). All of us want DSpace to scale well and avoid memory leaks -- if it takes a new architecture to do so that is one possible route forward. But, the main thing to keep in mind is that DSpace is built maintained by volunteer developers -- so, we need to find the volunteers (and convince their institutions) to help make this happen. It sounds like we've already located a few interested parties in this discussion. So, I hope that we can move forward with this work soon and perhaps even make some quick improvements in time for the rapidly approaching 1.7.0 release. If you'd like to volunteer to help us out, please let us know how you'd like to help! - Tim -- Tim Donohue Technical Lead for DSpace Project DuraSpace.org On 9/22/2010 10:33 AM, Tom De Mulder wrote: I am very happy to see that this issue seems finally to be taken seriously. However, I find myself getting a bit frustrated that it was never taken seriously when I raised it in the past. I think the DSpace source code carries with it a lot of historical baggage, and it could do with being addressed even without making fundamental changes to the basic architecture. Although my personal favourite would be a completely new architecture with more loosely coupled modules, but fixing memory leaks and the associated slow performance would be a good start. I can add that, for example, deleting a collection with 1200 items on our rather powerful DSpace machines will take two hours, and uses most of the available memory. You can see why I would like that no longer to be the case. Best regards, -- Tom De Muldertd...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On Sep 22, 2010, at 12:10 PM, Tim Donohue wrote: (2) We currently don't have a centralized server with enough test data to run many of these memory or scalability tests on our own. I think this is something we could look into improving upon (especially if anyone has test data to donate to the cause). There's a lot of Creative Commons licensed content in the DSpace-sphere. Perhaps an effort to gather what various sites are willing to donate into a DuraSpace repository would give us the amount of data we need, as well as beneficial heterogeneity in said data? Perhaps beyond this (and certainly there would be other considerations here) it could be set up in such a way that the data could be (extremely) easily replicated into one's test environment to put an instance through its paces? I agree with Robin T. that it is in everyone's interest to improve our performance testing prior to each release. I'd also encourage Graham (and others) to share their testing routes so that we can work to make this happen, and start to locate these performance issues *before* new releases, rather than after. Pursuant of a first step in this direction (and one that would help me personally), I'd like to ask if anyone out there has an Apache JMeter test plan file that is/could be generalized for use stressing any DSpace application. I know that each instance has its own customizations, URL patterns, areas to stress, etc. but there is a lot that could be covered generally for any implementation. Does this exist out there? I have always just cobbled together a very simplistic setup that hits the front page, community-list, some particular items and URLs. Perhaps we can collaboratively build one out with everyone's input. -- sands fish Software Engineer MIT Libraries Technology Research Development sa...@mit.edumailto:sa...@mit.edu E25-131 -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
A random collection of thoughts which occurred while reading this thread: o Performance, scalability, complexity, and ruggedness are sometimes competing influences on the design of code. We can improve in all of these aspects. Sometimes all of those influences will conspire to suggest a particular design, and at other times we will have to trade them off against one another. And performance, in particular, is tricky to characterize, because a design that performs best at small scale may be worst at large scale or vice versa. What I think I am getting at here is that we want many different kinds of goodness and we need to pursue them together if we want to achieve any of them in a meaningful way. o The testing work has also introduced some new automated reports that we should be reviewing. Have you seen how many FIXMEs there are, and what they are saying? Quite motivational. The Findbugs report is also interesting in spots. o Where it seems that code must be complex, thorough documentation of the thought behind it will not only capture important knowledge for the next person who has to work there, but can also provide opportunities to realize: good heavens, did I really write that? there must be a better way When I find myself writing absurd comments, it is usually because I have been writing (or was about to write) absurd code. o Best practice and commonest practice w.r.t. deployment of libraries seem to be antithetical in the Java universe. I was quite pleased to discover that I'm not the only one who thinks that Tomcat's /lib directory is on the app. classpath for good reasons. o The DSpace 2 architecture (which we are approaching by easy stages) attempts to address looser coupling and similar OO goals. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Balance your desire for bells and whistles with the reality that only a little more than 2 percent of world population has broadband. -- Ledford and Tyler, _Google Analytics 2.0_ pgp29I70MjuYL.pgp Description: PGP signature -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
And one point I forgot: o Volunteers dont' have to write code. If you aren't quite ready to step into the DSpace tarball with torch and machete, but can read Java, you can review the code and make suggestions. Many eyes make all bugs shallow. Bug reports (including performance problems) are always useful. And just asking, why is this so slow? can help to focus attention on design decisions which perhaps didn't get quite as much attention as they deserved. Keep asking until you get a sensible answer. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Balance your desire for bells and whistles with the reality that only a little more than 2 percent of world population has broadband. -- Ledford and Tyler, _Google Analytics 2.0_ pgpabL9aEuZNn.pgp Description: PGP signature -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On Wed, Sep 22, 2010 at 4:51 PM, Mark H. Wood mw...@iupui.edu wrote: o Best practice and commonest practice w.r.t. deployment of libraries seem to be antithetical in the Java universe. I was quite pleased to discover that I'm not the only one who thinks that Tomcat's /lib directory is on the app. classpath for good reasons. Actually nowadays AFAIK that is universally accepted as bad practice. It's not a coincidence Tomcat removed /common/lib in version 6. During the start of web development with Java, what you are defending become an obvious choice to avoid wasting resources. But the problem is that you need to adapt code in all the applications in the container at the same time to be able to move library versions. Which is really difficult for company's internal code, and impossible if there is any third-party or open-source code... Unless you want to run one Tomcat instance per application... Which bring us to the question what difference would make have the libs in Tomcat's lib in this scenario? o The DSpace 2 architecture (which we are approaching by easy stages) attempts to address looser coupling and similar OO goals. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Balance your desire for bells and whistles with the reality that only a little more than 2 percent of world population has broadband. -- Ledford and Tyler, _Google Analytics 2.0_ -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] tomcat reporting memory leak?
On 20 September 2010 15:59, Tom De Mulder td...@cam.ac.uk wrote: On Mon, 20 Sep 2010, Damian Marinaccio wrote: I'm seeing the following log messages in catalina.out: [...] SEVERE: The web application [] appears to have started a thread named [FinalizableReferenceQueue] but has failed to stop it. This is very likely to create a memory leak. There are quite a few memory leaks in DSpace. We have a cronjob to restart Tomcat nightly, because otherwise it'll break the next day. Hi all, Oh, welcome to my world!! I'm going to start off by pointing out that the majority of DSpace code is actually quite well behaved. Going back to the codebase circa 1.4.2 / 1.5, and using the JSP user interface - I've got *thirty* spearate DSpace repositories / applications running in a single Tomcat instance, which has operated without a restart in over 90 days. And whilst be able to undeploy and redeploy any of those applications at will - or just reload them so that they pick up new configuration. That does require a bit of careful setup / teardown in the context listeners (that wasn't always part of the DSpace code), and you need to get certain JARs - particularly the database/pooling drivers - out of the web applications entirely and into the shared level of Tomcat. Most of that is actually just good / recommended practise for systems administration of a Java application server anyway. I was careful to point out that I have achieved that with pre-1.6 code and JSP only. Both 1.6 and XML ui (of any age) change the landscape. XML ui has always taken a large chunk of resources, although whilst it was still based on Cocoon 2.1, I managed to at least clean up it's startup / shutdown behaviour by repairing it's logging handler. This behaviour has changed with Cocoon 2.2, and I'll come back to that shortly. So, 1.6 - I've been doing some work on the resource usage and clean loading/unloading of both JSP and XML using 1.6.2 recently, and neither are clean out of the box. The first issue you run into is the FinalizableReferenceQueue noted in the stack trace above. This is coming from a reference map in reflectutils - and was found to be a cleanup problem in course of DSpace 2 development (the kernel / services framework was backported from that work). I added a LifecycleManager to reflectutils that was released as version 0.9.11 that allows the internal structures to be shutdown cleanly, and implemented this as part of DSpace 2, however this appears to have been ignored in the backport. So, with the reflectutils/Lifecycle changes, and careful placement of JARs, etc. I did get the JSP ui to unload cleanly last week. I would note that I didn't stress the application too heavily, so there may be some operations that might trigger different code paths that are still a problem, but at the baseline it was working correctly. XML ui has proven to be a somewhat more challenging beast. I first ran into two problems that are inside Cocoon 2.2 itself - 1) in the sitemap processing, it's using a stack inside a ThreadLocal, but it never removes the stack when it empties it, and 2) in one class relating to flowscript handling, it does not clean up the Mozilla Rhino engine correctly when it's finished using it (curiously, it's used in a number of places, and everywhere else it appears to be structured correctly to clean up - just this one class is screwed up). With locally patched versions of the sitemap and flowscript JARs from Cocoon (the ThreadLocal patch isn't really guaranteed to not leak in unexpected circumstances - but it was sufficient to remove the problem in the scope of this testing. Basically, ThreadLocal is really dangerous to use), I then ran into another issue, this time with the CachingService that was backported. With XML ui, it's using the RequestScope function of the caching service (it didn't appear to be exercising this part with JSP - that may just be because I only ran through limited code paths). For the RequestScope, it's tying the cache not to the request object... but to a ThreadLocal. And that ThreadLocal isn't being cleaned up at the end of the request. (The shutdown code is also incapable of doing the job it's intended for, as it will only ever execute on a single thread, and not see all the other threads that may have processed requests). There is a high probability of this leaking memory all over the place, and there is also the nasty potential of leak information across requests that is undesirable. I made another hacked version that removes the ThreadLocal, but replicates a lot of it's thread affinity behaviour (so, it still has the nasty side effects of the implementation, but at least removed the hold the system had over the application resources). XML ui was *still* not unloading correctly, and at this point the profiler stopped giving me pointers to strong references that were being held. So right now I'm not sure what else is up - but there is at least one more troubling part of the
Re: [Dspace-tech] tomcat reporting memory leak?
Hi, Graham, for what it's worth, I'll stand with you. :-) I think addressing the issues you've discovered is really important. Here's an idea: how about some new unit and/or performance tests that check if a class and/or app is unloading cleanly? In other words, would it be possible to express the tests you have in such a way that they could be part of the new testing framework? Are there JIRA issues, and/or patches for what you have already found/fixed? --Hardy -Original Message- From: Graham Triggs [mailto:grahamtri...@gmail.com] Sent: Tuesday, September 21, 2010 6:52 AM To: Tom De Mulder Cc: dspace-tech@lists.sourceforge.net; Damian Marinaccio Subject: Re: [Dspace-tech] tomcat reporting memory leak? On 20 September 2010 15:59, Tom De Mulder td...@cam.ac.uk wrote: On Mon, 20 Sep 2010, Damian Marinaccio wrote: I'm seeing the following log messages in catalina.out: [...] SEVERE: The web application [] appears to have started a thread named [FinalizableReferenceQueue] but has failed to stop it. This is very likely to create a memory leak. There are quite a few memory leaks in DSpace. We have a cronjob to restart Tomcat nightly, because otherwise it'll break the next day. Hi all, Oh, welcome to my world!! I'm going to start off by pointing out that the majority of DSpace code is actually quite well behaved. Going back to the codebase circa 1.4.2 / 1.5, and using the JSP user interface - I've got *thirty* spearate DSpace repositories / applications running in a single Tomcat instance, which has operated without a restart in over 90 days. And whilst be able to undeploy and redeploy any of those applications at will - or just reload them so that they pick up new configuration. That does require a bit of careful setup / teardown in the context listeners (that wasn't always part of the DSpace code), and you need to get certain JARs - particularly the database/pooling drivers - out of the web applications entirely and into the shared level of Tomcat. Most of that is actually just good / recommended practise for systems administration of a Java application server anyway. I was careful to point out that I have achieved that with pre-1.6 code and JSP only. Both 1.6 and XML ui (of any age) change the landscape. XML ui has always taken a large chunk of resources, although whilst it was still based on Cocoon 2.1, I managed to at least clean up it's startup / shutdown behaviour by repairing it's logging handler. This behaviour has changed with Cocoon 2.2, and I'll come back to that shortly. So, 1.6 - I've been doing some work on the resource usage and clean loading/unloading of both JSP and XML using 1.6.2 recently, and neither are clean out of the box. The first issue you run into is the FinalizableReferenceQueue noted in the stack trace above. This is coming from a reference map in reflectutils - and was found to be a cleanup problem in course of DSpace 2 development (the kernel / services framework was backported from that work). I added a LifecycleManager to reflectutils that was released as version 0.9.11 that allows the internal structures to be shutdown cleanly, and implemented this as part of DSpace 2, however this appears to have been ignored in the backport. So, with the reflectutils/Lifecycle changes, and careful placement of JARs, etc. I did get the JSP ui to unload cleanly last week. I would note that I didn't stress the application too heavily, so there may be some operations that might trigger different code paths that are still a problem, but at the baseline it was working correctly. XML ui has proven to be a somewhat more challenging beast. I first ran into two problems that are inside Cocoon 2.2 itself - 1) in the sitemap processing, it's using a stack inside a ThreadLocal, but it never removes the stack when it empties it, and 2) in one class relating to flowscript handling, it does not clean up the Mozilla Rhino engine correctly when it's finished using it (curiously, it's used in a number of places, and everywhere else it appears to be structured correctly to clean up - just this one class is screwed up). With locally patched versions of the sitemap and flowscript JARs from Cocoon (the ThreadLocal patch isn't really guaranteed to not leak in unexpected circumstances - but it was sufficient to remove the problem in the scope of this testing. Basically, ThreadLocal is really dangerous to use), I then ran into another issue, this time with the CachingService that was backported. With XML ui, it's using the RequestScope function of the caching service (it didn't appear to be exercising this part with JSP - that may just be because I only ran through limited code paths). For the RequestScope, it's tying the cache not to the request object... but to a ThreadLocal. And that ThreadLocal isn't being cleaned up at the end
[Dspace-tech] tomcat reporting memory leak?
I'm seeing the following log messages in catalina.out: INFO: Deploying web application directory ROOT Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads SEVERE: The web application [] appears to have started a thread named [FinalizableReferenceQueue] but has failed to stop it. This is very likely to create a memory leak. Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads SEVERE: The web application [] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has f ailed to stop it. This is very likely to create a memory leak. Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap SEVERE: The web application [] created a ThreadLocal with key of type [org.dspace.services.caching.ThreadLocalMap] (value [o rg.dspace.services.caching.threadlocal...@d32560]) and a value of type [java.util.HashMap] (value [{}]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak. Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@ 3adaaa]) and a value of type [org.apache.cocoon.environment.internal.EnvironmentStack] (value [[]]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak. Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@ 1ea0b8a]) and a value of type [org.apache.xerces.parsers.SAXParser] (value [org.apache.xerces.parsers.saxpar...@bfa709]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak. Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@ 9b9a36]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.segmentterme...@1a95d e6]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak. Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@ 3adaaa]) and a value of type [org.apache.cocoon.environment.internal.EnvironmentStack] (value [[]]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak. Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@ 53bd6e]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.segmentterme...@1b9c0 86]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak. Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@ ecd7c]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.segmentterme...@1d4afe d]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak. Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@ 6a081c]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.segmentterme...@13a9a cb]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak. Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@ 88a3ce]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.segmentterme...@ba2bb 5]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak. Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@ 1ea0b8a]) and a value of type [org.apache.xerces.parsers.SAXParser] (value [org.apache.xerces.parsers.saxpar...@b481ba]) but
Re: [Dspace-tech] tomcat reporting memory leak?
On Mon, 20 Sep 2010, Damian Marinaccio wrote: I'm seeing the following log messages in catalina.out: [...] SEVERE: The web application [] appears to have started a thread named [FinalizableReferenceQueue] but has failed to stop it. This is very likely to create a memory leak. There are quite a few memory leaks in DSpace. We have a cronjob to restart Tomcat nightly, because otherwise it'll break the next day. Best, -- Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH - 20/09/2010 : The Moon is Waxing Gibbous (80% of Full) -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech