Re: [Dspace-tech] tomcat reporting memory leak?

2010-10-06 Thread Graham Triggs
On 5 October 2010 16:33, Simon Brown st...@cam.ac.uk wrote:

 Which nobody has requested, making this a massive red herring. I fail
 to see how cutting back on unnecessary and redundant database access
 constitutes overhead to cover up the problems of larger
 repositories.


One person's unnecessary and redundant database access is another's very
necessary database access - well, at least it can be.

I remember the patch for reducing the updating of browse / search indexes,
and I can see why it would be useful to not do those updates during a batch
import if you have an appropriate workflow.

That won't be the case for all of the repositories - quite a few will
welcome the ability to see those items as and when they are added. There is
also the issue of how long it takes to do the one very big update at the end
of the batch run vs. incremental changes as you go - it may be less work
overall, but having one big change can be more disruptive in some cases.


 Any repository, regardless of size, will see
 improvements with this kind of optimisation, at least one example of
 which I have already highlighted (and had my arguments shouted down -
 this is also, incidentally, why I haven't bothered to open any other
 JIRA tickets on other performance issues we've seen. What would be the
 point?)


No, you didn't get shouted down for raising a performance issue. Where the
argument came was because you assumed that this would clearly be of benefit
to any repository, when you did nothing to address the underlying
performance issues (which could have been helped quite dramatically with
some small SQL tweaks and some configuration work in Postgres), and instead
just bypassed them for one very specific use case.

It doesn't matter how large or small a repository is, if they don't perform
batch uploads using the ItemImporter, your change will do *nothing* for
them. But an alteration to the underlying SQL, and guidelines for getting
the best out of Postgres would benefit everyone - regardless of how large or
small the repository is, or the means by which they populate it.


 The pertinent question for me is why, whenever the issue of
 performance comes up, is one of these theoretical future of
 repositories screeds pulled out and slammed down in front of the
 conversation? People are reporting problems with the systems they have
 *right now*.


It's not meant to be a barrier to conversation, but a question as to what
you want to resolve. Do you want to address the *scalability* of DSpace, or
do you just want to avoid an immediate performance bottleneck? If we
conflate these, conversations are going to stall, and we're not going to
make any progress.


 Or rather, they were. And yes, it is true that there is a
 finite limit to what the hardware is capable of, but the quality of
 the software plays a significant role in how quickly that limit is
 reached. But we've had this conversation before. I don't really expect
 it to end any better this time than it did then.


I completely agree - but a solution that breaks the encapsulation of the
components in the system, and leaves important indexes in an inconsistent
state for an extended period of time is not an automatic win for the
majority of the community.

I offered a lot of suggestions as to how that code could be better
structured, improvements both to the SQL and the configuration of Postgres
to handle the load more efficiently, and suggestions for further tweaks that
would reduce the amount of updates that the code would have needed to do
still further. All of which would have be more beneficial to the community
(not just improving batch uploads, but interactive / singular deposits and
edits) - and not only that, would have improved the performance of your
systems further than you had so far achieved.

Any method of increasing the processing capabilities of a system,
 either through more powerful hardware or improvements in the software,
 is postponing the inevitable for any repository with continued
 growth. The difference is in how much cost there is to any individual
 repository in each of those methods. Our system, with the changes
 we've made to it, struggles at around 300,000 items. People are
 reporting problems (presumably running stock 1.6.2) at around 50,000,
 from what I can gather.


This is where we need to be careful about what we are reporting. Quite a few
of the issues around 1.6.x appear to be around rampant memory usage, rather
than a clear function of how many records there are in the database. There
are also different issues involved if we are talking about adding / editing
lots of records, or simply highly accessed.

Even so, regardless of what we do to the code to make it efficient, it does
not and can not absolve the system administrator of correctly maintaining
both DSpace itself, and it's dependencies. I wouldn't want to get drawn on
where that point is without any evidence, but there is a lot of scope for
altering and improving Postgres 

Re: [Dspace-tech] tomcat reporting memory leak?

2010-10-06 Thread Tom De Mulder
On 6 Oct 2010, at 15:15, Graham Triggs wrote:

[snip]

This is exactly the kind of pointless pontification that we got last time.

Any point that is raised is deflected or ignored, and you even manage to 
contradict yourself between paragraphs. What's it to be, should patches benefit 
ALL repositories, or is it fine if it's just some? Or the other way round, 
maybe?


I will be very happy to offer our experiences regarding large-scale DSpace 
instances with the community, if that can be of any help. But not if it 
involves having to deal with Graham Triggs.


I really do not have time for this.


--
Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH


--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-10-06 Thread Tim Donohue
All,

I would really appreciate it if we could stop the negativity in this 
discussion thread. I'm sorry to have to post a message of this sort 
publicly, but I feel I'm unfortunately being forced to do so.

Insults and negativity on a public listserv do not help anyone. I also 
personally take offense to the insulting of anyone in our DSpace 
Committers group, as they are volunteering their own time (sometimes 
even outside of their workplace) to make DSpace software better.  Open 
source software does not build and maintain itself, and our group of 
Committers have made it their passion to improve DSpace for the benefit 
of us all.

Despite any arguments or differences we all may have, it is in our best 
interest to work together to resolve these issues in a friendly and 
timely manner. There is a place for arguments and disagreements on these 
DSpace mailing lists and I welcome them, provided they are kept 
constructive.

I'm in touch with Cambridge around their performance issues off-list, 
and hope that we can work towards a solution to these issues for 
everyone involved.

Thanks,

Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-10-05 Thread Simon Brown

On 4 Oct 2010, at 15:00, Graham Triggs wrote:

 On 29 September 2010 14:17, Tom De Mulder td...@cam.ac.uk wrote:
 I know you like to talk down the problem, but that really isn't  
 helping.

 This isn't about talking down the problem - it's about finding where  
 the real problems are and not just patching the immediate concerns.  
 And considering the interests of nearly 1000 DSpace instances that  
 are registered on dspace.org - many of whom will probably be more  
 worried about rampant resource usage for small repositories from  
 adding overhead to cover up the problems of larger repositories.

Which nobody has requested, making this a massive red herring. I fail  
to see how cutting back on unnecessary and redundant database access  
constitutes overhead to cover up the problems of larger  
repositories. Any repository, regardless of size, will see  
improvements with this kind of optimisation, at least one example of  
which I have already highlighted (and had my arguments shouted down -  
this is also, incidentally, why I haven't bothered to open any other  
JIRA tickets on other performance issues we've seen. What would be the  
point?)

 We run 5 DSpace instances, three of these are systems with hundreds  
 of thousands of items, and it's dog slow and immensely resource- 
 intensive. And yes, we want these to be single systems. Why  
 shouldn't we?

 Surely the more pertinent question is why wouldn't you want to be  
 able to run a multi-node solution? I'm sure I don't need to tell you  
 that no matter how good a job you do of making the system perform  
 better with larger datasets, there will always be a finite limit to  
 how large the repository can be, how many users you can service, and  
 how quickly it will process requests for any given hardware  
 allocation.

The pertinent question for me is why, whenever the issue of  
performance comes up, is one of these theoretical future of  
repositories screeds pulled out and slammed down in front of the  
conversation? People are reporting problems with the systems they have  
*right now*. Or rather, they were. And yes, it is true that there is a  
finite limit to what the hardware is capable of, but the quality of  
the software plays a significant role in how quickly that limit is  
reached. But we've had this conversation before. I don't really expect  
it to end any better this time than it did then.

 Yes, DSpace can do a better job than it currently does, but it's  
 just postponing the inevitable. How much in technology relies on  
 just making things bigger/faster? Even our single system hardware is  
 generally made of multiple identical components - CPUs with multiple  
 cores, memory consisting of multiple 'sticks', each consisting of  
 multiple storage chips, storage combining multiple hard drives each  
 having multiple platters.

Any method of increasing the processing capabilities of a system,  
either through more powerful hardware or improvements in the software,  
is postponing the inevitable for any repository with continued  
growth. The difference is in how much cost there is to any individual  
repository in each of those methods. Our system, with the changes  
we've made to it, struggles at around 300,000 items. People are  
reporting problems (presumably running stock 1.6.2) at around 50,000,  
from what I can gather. That means that the optimum size for a single  
repository running unmodified 1.6.2 is less than 50,000 items, or more  
than six separate DSpace instances for the number of items we hold.  
That's at least a sixfold increase in hardware and operational costs.  
Even in a situation where higher education funding had not just been  
significantly cut, that amount of money would be rather difficult to  
come by. In a situation where people are able to point to  
significantly better performance from other systems on similar  
hardware, it would become substantially more difficult.

 And much of our dependencies are going the same way - Oracle  
 database clusters, Solr is designed to get scalability from running  
 over multiple shards, even Postgres has taken a major step towards  
 clustering / replication with it's 9.0 release.

 Either way, you will always hit a hard limit with keeping things on  
 a single system - so at some point, something has to give, whether  
 it's separating out DSpace application, Solr and Postgres instances  
 to separate machines, or accepting this reality in the repository  
 and building it to scale across multiple nodes itself. This in turn  
 would bring benefits to how easily you can scale (in theory, a lot  
 easier to scale at the repository level than scaling each of it's  
 individual components), as well as potentially better preservation  
 and federation capabilities.

Leaving aside any theoretical ideal futures for the moment, it seems  
to me that the gist of this conversation is DSpace does not support  
single-instance repositories over a certain size. That 

Re: [Dspace-tech] tomcat reporting memory leak?

2010-10-05 Thread Tim Donohue
Hi Simon  All,

On 10/5/2010 10:33 AM, Simon Brown wrote:

 On 4 Oct 2010, at 15:00, Graham Triggs wrote:

 On 29 September 2010 14:17, Tom De Muldertd...@cam.ac.uk  wrote:
 I know you like to talk down the problem, but that really isn't
 helping.

 This isn't about talking down the problem - it's about finding where
 the real problems are and not just patching the immediate concerns.
 And considering the interests of nearly 1000 DSpace instances that
 are registered on dspace.org - many of whom will probably be more
 worried about rampant resource usage for small repositories from
 adding overhead to cover up the problems of larger repositories.

 Which nobody has requested, making this a massive red herring. I fail
 to see how cutting back on unnecessary and redundant database access
 constitutes overhead to cover up the problems of larger
 repositories. Any repository, regardless of size, will see
 improvements with this kind of optimisation, at least one example of
 which I have already highlighted (and had my arguments shouted down -
 this is also, incidentally, why I haven't bothered to open any other
 JIRA tickets on other performance issues we've seen. What would be the
 point?)

It's really unfortunate that you've experienced this and/or felt this 
way in the past.  Perhaps we haven't been able to tease out the problems 
at hand as well as we could have, and I hope we can improve upon that now.

However, I'd highly recommend freely adding specific issues to our JIRA 
-- it will *guarantee* that the DSpace committers will review  discuss 
them (each week, we set aside time in our weekly meeting to do so -- see 
https://wiki.duraspace.org/display/DSPACE/Developer+Meetings ).  When 
adding JIRA issues, specifics are best, that way we can narrow down 
where the problem may reside.

The longer these specific issues remain outside of JIRA, the more likely 
they will be accidentally overlooked in future versions of DSpace (as 
JIRA is our primary means of scheduling things to be fixed in new 
versions).  We really do mean well, and we'd like to work with you to 
resolve these issues.  We're not trying to continually throw up red 
herrings to avoid problems -- it's really a matter of attempting to 
better understand where the specific issue resides.

As volunteer developers, each of the DSpace Committers all only have a 
limited amount of time to work on DSpace in a given week. Therefore, the 
more information you can provide us with, the better. If you know of 
specific areas where there are redundant database accesses, we'd 
appreciate it if you could point them out to us (or enter a JIRA issue 
and we'll fix it).  We want to resolve these issues, but sometimes we 
don't have enough time in our normal work week to dig in deep enough to 
locate them.  We highly encourage sites who have stumbled across 
problems in the code to report them -- that way we can look at that 
specific area of the code and fix it so that it is no longer an issue.

 Leaving aside any theoretical ideal futures for the moment, it seems
 to me that the gist of this conversation is DSpace does not support
 single-instance repositories over a certain size. That being the
 case, I think it would be only fair to make that lack of support
 explicit in the documentation and PR materials for the software, in
 order that all of the relevant information is readily available for
 anyone making decisions about the future of their repository.

I'd say we want to support single-instance repositories of larger sizes 
as well.  There will always be a size limit where it makes more sense to 
scale across multiple nodes, but we should be working to increase that 
size limit as much as we can (within reason, obviously).  Although it 
isn't yet explicit in our RoadMap, I think we also want to work towards 
allowing DSpace to scale across multiple nodes (where it makes sense to).

Again, the best way for us to improve your immediate DSpace performance 
is to better understand the exact problems you've already noticed.  We 
can only fix issues that we know about, and sometimes discovering where 
the issue resides can be the hardest part. If you've already discovered 
very specific issue(s), we'd appreciate it if you can share them.  If 
you haven't yet discovered the exact issue(s), we may be able to help 
narrow down the problem if you can share which parts of your DSpace seem 
'especially sluggish', etc.

The end result is that we really should be working together on a 
resolution for the present, rather than continually arguing over ideal 
futures or past discussions. Open source development works best if we 
can all share information/ideas/issues/resolutions freely and openly. 
Yes, that also means sometimes arguing openly -- which is perfectly OK 
by me, as sometimes arguments bring us all to a better solution or route 
forward. But, I do want to encourage us all to keep things constructive, 
so that we can move DSpace software forward to the 

Re: [Dspace-tech] tomcat reporting memory leak?

2010-10-04 Thread Graham Triggs
On 29 September 2010 14:17, Tom De Mulder td...@cam.ac.uk wrote:

 I know you like to talk down the problem, but that really isn't helping.


This isn't about talking down the problem - it's about finding where the
real problems are and not just patching the immediate concerns. And
considering the interests of nearly 1000 DSpace instances that are
registered on dspace.org - many of whom will probably be more worried about
rampant resource usage for small repositories from adding overhead to cover
up the problems of larger repositories.


 We run 5 DSpace instances, three of these are systems with hundreds of
 thousands of items, and it's dog slow and immensely resource-intensive. And
 yes, we want these to be single systems. Why shouldn't we?


Surely the more pertinent question is why wouldn't you want to be able to
run a multi-node solution? I'm sure I don't need to tell you that no matter
how good a job you do of making the system perform better with larger
datasets, there will always be a finite limit to how large the repository
can be, how many users you can service, and how quickly it will process
requests for any given hardware allocation.

Yes, DSpace can do a better job than it currently does, but it's just
postponing the inevitable. How much in technology relies on just making
things bigger/faster? Even our single system hardware is generally made of
multiple identical components - CPUs with multiple cores, memory consisting
of multiple 'sticks', each consisting of multiple storage chips, storage
combining multiple hard drives each having multiple platters.

And much of our dependencies are going the same way - Oracle database
clusters, Solr is designed to get scalability from running over multiple
shards, even Postgres has taken a major step towards clustering /
replication with it's 9.0 release.

Either way, you will always hit a hard limit with keeping things on a single
system - so at some point, something has to give, whether it's separating
out DSpace application, Solr and Postgres instances to separate machines, or
accepting this reality in the repository and building it to scale across
multiple nodes itself. This in turn would bring benefits to how easily you
can scale (in theory, a lot easier to scale at the repository level than
scaling each of it's individual components), as well as potentially better
preservation and federation capabilities.

G
--
Virtualization is moving to the mainstream and overtaking non-virtualized
environment for deploying applications. Does it make network security 
easier or more difficult to achieve? Read this whitepaper to separate the 
two and get a better understanding.
http://p.sf.net/sfu/hp-phase2-d2d___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-30 Thread Pottinger, Hardy J.
Hi, first, I want to thank Mark Wood for recommending LambdaProbe, it is 
proving a very useful tool. I can see already that we need to increase our 
PermGen, and will probably borrow Mark's JAVA_OPTS settings for our production 
and development Tomcat instances.

In trying to further educate myself about these issues, I came across this 
excellent page on the Tomcat wiki, which at the end includes 
debugging/troubleshooting advice that is very close to the procedure Graham 
Triggs outlined at a recent committer's meeting. I'm forwarding this link to 
the list, as I think it might prove useful to others:

http://wiki.apache.org/tomcat/OutOfMemory

--Hardy 

 -Original Message-
 From: Mark H. Wood [mailto:mw...@iupui.edu]
 Sent: Wednesday, September 29, 2010 12:08 PM
 To: dspace-tech@lists.sourceforge.net
 Subject: Re: [Dspace-tech] tomcat reporting memory leak?
 
 I'd like to point out that the discussion is broadening considerably:
 a system can be slow for many reasons, not just memory starvation.
 
 Step 1: what resource(s) are you short of?  Something like LambdaProbe
 can peek inside Tomcat and show you how much of each of the various
 memory pools is being used.  OS tools can show whether you are
 swapping heavily or spending a lot of time in I/O wait or are really
 CPU-bound (and what, besides Tomcat, may be eating CPU).  DBMS tools
 can reveal places in the schema that don't scale well, queries that
 could be optimized, and additional indices that would be beneficial.
 
 It would be really helpful for large, busy sites with performance
 problems to share any such detailed observations.  Some of those
 problems can probably be tuned away, and some will point to specific
 things for coders to investigate.  Scaling experience will be valuable
 both in documenting good ways to tune up for DSpace and in finding
 design hotspots for rework.
 
 --
 Mark H. Wood, Lead System Programmer   mw...@iupui.edu
 Balance your desire for bells and whistles with the reality that only a
 little more than 2 percent of world population has broadband.
   -- Ledford and Tyler, _Google Analytics 2.0_

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Tom De Mulder
On 24 Sep 2010, at 21:17, bill.ander...@library.gatech.edu wrote:

 We've been experiencing problems similar to some reported on this thread 
 since our
 upgrade to 1.6 several months ago.  We're still using the jspui, and we've 
 wondered 
 (among other things) if some of these problems might be alleviated by a 
 switch to
 the xmlui.  Has anybody had any experience comparing the memory footprint 
 and/or resource
 usage issues between the two interfaces?

We load-tested the XMLUI (on identical hardware) and it was even worse. It ran 
out of memory and crashed really quickly, so we never took it into production. 
But your mileage may vary.


Best regards,

--
Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH


--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Tom De Mulder
On 29 Sep 2010, at 11:38, Hilton Gibson wrote:

 We started with a VM which had 2GB memory.
 Then added 2GB to the VM, no luck.
 Then luckily we had funds to buy a server.
 So now we have 12GB RAM and 12CPU's. No crashes so far.
 Using the XMLUI.
 Does DSpace really need this and what happens when we go to one million items 
 ??

A lot of the back-end code of DSpace, the very core of it, is inherently 
inefficient. Several tasks are executed more than once, and entire objects are 
created when only one attribute is needed, etc. (I'd be more specific, but I'm 
not a specialist on this matter, and our resident DSpace developer is on leave 
this week.)


I am really glad to hear from other people with problems similar to ours.


--
Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH


--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Tom De Mulder
On 29 Sep 2010, at 11:47, Mark Ehle wrote:

 Why was tomcat chosen as a platform for DSpace?

It wasn't. You can use any Servlet engine. We used JBoss for a while but went 
back to Tomcat because it fitted into our infrastructure better.

I believe DSpace was written in Java because Rob Tansley wanted to try writing 
a project in Java, but I could be wrong. :)


Best,

--
Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH


--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Graham Triggs
On 29 September 2010 11:38, Hilton Gibson hilton.gib...@gmail.com wrote:

 Using the XMLUI.
 Does DSpace really need this and what happens when we go to one million
 items ??


Does DSpace really need that? No. As I have said, I'm running 30 separate
repositories - using JSPUI (circa 1.4.2 / 1.5 codebase) - all on a single
server / Tomcat instance.

Some of those repositories have 1000s of items, and get quite decent levels
of access.

The server has 8GB installed, 3GB heap turned over to Tomcat (plus 1GB for
non-heap).

The Tomcat instance has 2GB of *free* heap space, rarely runs above 5% cpu
usage, and has plenty of capacity to run more repositories (the rate at
which files are opened/closed is actually a bigger issue for Tomcat
startup).

Although, it's worth pointing out that the database is hosted on a separate
server - I can't say how many resources that is really using, as it's shared
with other services, but it is apparently 'tiny'.



What happens at one million items? Well, that's an interesting issue. But is
it really the right question to be asking? How far do you want/need to be
able to scale a 'monolithic' instance, before you spread it over multiple
servers?

As long as you can spread it over multiple servers, it gives you a much
higher ceiling than relying on a single box - and it is easier to scale for
increasing size/usage by adding more boxes (you don't have to migrate).

If you focus on scaling a single installation, then you end up increasing
the overall requirements (ie. memory for caching), and make it harder to
have scaling over multiple boxes at all.

G
--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Graham Triggs
On 29 September 2010 11:48, Tom De Mulder td...@cam.ac.uk wrote:

 A lot of the back-end code of DSpace, the very core of it, is inherently
 inefficient


I don't entirely disagree with that statement - there are some things that
can definitely be improved, particularly where you have to deal with more
items in a single instance.

But take a look at my numbers - at it's core, it really isn't that bad for
the vast majority of DSpace users (how many have more than even 50,000 items
currently)? And some of it depends on correct system setup (Postgres
version/options, etc.)

It's adding xmlui, solr, etc. that is putting a lot more demands on the
system.


G
--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Graham Triggs
That begs the question as do you think something else should be chosen /
recommended?

There really isn't anything preventing you using Jetty, etc. but Tomcat is
actually a pretty solid server that does a lot of things quite well - and
particularly in recent versions in being defensive against bad application
behaviour.

And when you look at the grand scheme of things, the smaller footprint of
Jetty doesn't really make a whole lot of difference.

G

On 29 September 2010 11:47, Mark Ehle marke...@gmail.com wrote:

 Why was tomcat chosen as a platform for DSpace?


--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Mark H. Wood
We're comfortably running *three* production DSpace instances in a
single Tomcat 6 with these limits:

JAVA_OPTS=-Xmx1024M -Xms768M
JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=128M
JAVA_OPTS=$JAVA_OPTS -XX:PermSize=32M

That's on a box with 3GB of physical memory.  One DSpace instance is
1.6, and the other two are 1.5.

Now, I do have an old weekly reminder to check PermGen on that box,
but it is always around half filled these days.  We had problems in
the past, but newer versions of DSpace seem to do much better in that
regard.  I can't recall the last time we had to restart that Tomcat
just to clean up memory.

We have a development box with maybe two dozen DSpace instances, none
of them very busy at all, various versions and states of disrepair,
and we do have to restart Tomcat there from time to time if we are
doing a lot of webapp. reloading.  The limits there are:

JAVA_OPTS=-Xmx1024M -Xms128M
JAVA_OPTS=$JAVA_OPTS -XX:PermSize=192M
JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=384M

on a 4GB machine.

-- 
Mark H. Wood, Lead System Programmer   mw...@iupui.edu
Balance your desire for bells and whistles with the reality that only a 
little more than 2 percent of world population has broadband.
-- Ledford and Tyler, _Google Analytics 2.0_


pgpDu2gzdSXuS.pgp
Description: PGP signature
--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Mark H. Wood
On Wed, Sep 29, 2010 at 11:48:02AM +0100, Tom De Mulder wrote:
 A lot of the back-end code of DSpace, the very core of it, is inherently 
 inefficient. Several tasks are executed more than once, and entire objects 
 are created when only one attribute is needed, etc. (I'd be more specific, 
 but I'm not a specialist on this matter, and our resident DSpace developer is 
 on leave this week.)

When your developer has time, I think that specific JIRA tickets on
these observations would be appreciated.  We need all the eyes we can
borrow.  It needn't be a rigorous analysis (though that would be
wonderful).  Significant inefficiencies noted in passing are important
information.

-- 
Mark H. Wood, Lead System Programmer   mw...@iupui.edu
Balance your desire for bells and whistles with the reality that only a 
little more than 2 percent of world population has broadband.
-- Ledford and Tyler, _Google Analytics 2.0_


pgp5elq6QWOXZ.pgp
Description: PGP signature
--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Tom De Mulder
On 29 Sep 2010, at 13:03, Graham Triggs wrote:
 
 Some of those repositories have 1000s of items, and get quite decent levels
 of access.
 

Thousands?

I don't even want to have this discussion until you're talking hundreds of 
thousands, and how many hits per second. I know you like to talk down the 
problem, but that really isn't helping.

We run 5 DSpace instances, three of these are systems with hundreds of 
thousands of items, and it's dog slow and immensely resource-intensive. And 
yes, we want these to be single systems. Why shouldn't we?

We have other systems here at the University that are much bigger, do similar 
things and require far, far less in terms of resources.

--
Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH


--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Tim Donohue
Hi all,

Interesting thread so far and keep up the good discussion.

I think it'd be helpful to us all if we could all share more information 
about our DSpace setups (similar to Mark Wood's tip on his local 
JAVA_OPTS settings).  The more we know about your 
DSpace/Java/Tomcat/Postgres (or Oracle) configurations, server setups, 
etc. the better chance we have at helping you out. There may be some 
immediate performance improvements you can achieve just by tweaking your 
setup/configurations slightly.

I had setup a basic template for this on the Wiki at 
https://wiki.duraspace.org/display/DSPACE/ScalabilityIssues1.6
But, feel free to just send info along in any format you wish.  The 
template was mostly there to give everyone an idea of what type of 
information can be useful to us (so that we can hopefully provide you 
with some helpful suggestions and find longer term fixes).

Obviously, we also want to track down and fix any memory leaks or larger 
problems as well.  So if you've already discovered specific issues, let 
us know about those as well, so we can add them to our Issue Tracker 
(http://jira.dspace.org/) and schedule them to be resolved.

Thanks,

Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org

On 9/29/2010 7:59 AM, Mark H. Wood wrote:
 We're comfortably running *three* production DSpace instances in a
 single Tomcat 6 with these limits:

 JAVA_OPTS=-Xmx1024M -Xms768M
 JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=128M
 JAVA_OPTS=$JAVA_OPTS -XX:PermSize=32M

 That's on a box with 3GB of physical memory.  One DSpace instance is
 1.6, and the other two are 1.5.

 Now, I do have an old weekly reminder to check PermGen on that box,
 but it is always around half filled these days.  We had problems in
 the past, but newer versions of DSpace seem to do much better in that
 regard.  I can't recall the last time we had to restart that Tomcat
 just to clean up memory.

 We have a development box with maybe two dozen DSpace instances, none
 of them very busy at all, various versions and states of disrepair,
 and we do have to restart Tomcat there from time to time if we are
 doing a lot of webapp. reloading.  The limits there are:

 JAVA_OPTS=-Xmx1024M -Xms128M
 JAVA_OPTS=$JAVA_OPTS -XX:PermSize=192M
 JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=384M

 on a 4GB machine.




 --
 Start uncovering the many advantages of virtual appliances
 and start using them to simplify application deployment and
 accelerate your shift to cloud computing.
 http://p.sf.net/sfu/novell-sfdev2dev



 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Mark Ehle
Thanks - I was just curious.

On Wed, Sep 29, 2010 at 6:53 AM, Tom De Mulder td...@cam.ac.uk wrote:
 On 29 Sep 2010, at 11:47, Mark Ehle wrote:

 Why was tomcat chosen as a platform for DSpace?

 It wasn't. You can use any Servlet engine. We used JBoss for a while but went 
 back to Tomcat because it fitted into our infrastructure better.

 I believe DSpace was written in Java because Rob Tansley wanted to try 
 writing a project in Java, but I could be wrong. :)


 Best,

 --
 Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service
 +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH



--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Tim Donohue
Quick followup, in case it isn't clear (as I was asked about this 
off-list).  The preference would be to share your DSpace 
setup/configuration information directly on this listserv (or you can 
post up on the wiki if you prefer).  That way we can get more eyes on 
it, and hopefully come up with better suggestions.

Also, this may be an area where sharing this information can help us to 
document some best practices, based on recommended setups and 
performance hints/tips that people have.  So, I'm hoping that as this 
thread continues, we can pull out the main tips/hints and document them 
for future reference.  At the same time, we can pull out the common 
memory/performance issues so that they can be investigated further, and 
hopefully resolved as soon as possible.

Committers -- it'd also be great if you can take a few moments to send 
your basic setup info  Dspace size to the listserv (especially noting 
anything that you may have tweaked above  beyond the normal DSpace 
install docs, like JAVA_OPTS or similar settings).  This can hopefully 
encourage others to do the same.

- Tim

On 9/29/2010 9:46 AM, Tim Donohue wrote:
 Hi all,

 Interesting thread so far and keep up the good discussion.

 I think it'd be helpful to us all if we could all share more information
 about our DSpace setups (similar to Mark Wood's tip on his local
 JAVA_OPTS settings). The more we know about your
 DSpace/Java/Tomcat/Postgres (or Oracle) configurations, server setups,
 etc. the better chance we have at helping you out. There may be some
 immediate performance improvements you can achieve just by tweaking your
 setup/configurations slightly.

 I had setup a basic template for this on the Wiki at
 https://wiki.duraspace.org/display/DSPACE/ScalabilityIssues1.6
 But, feel free to just send info along in any format you wish. The
 template was mostly there to give everyone an idea of what type of
 information can be useful to us (so that we can hopefully provide you
 with some helpful suggestions and find longer term fixes).

 Obviously, we also want to track down and fix any memory leaks or larger
 problems as well. So if you've already discovered specific issues, let
 us know about those as well, so we can add them to our Issue Tracker
 (http://jira.dspace.org/) and schedule them to be resolved.

 Thanks,

 Tim Donohue
 Technical Lead for DSpace Project
 DuraSpace.org

 On 9/29/2010 7:59 AM, Mark H. Wood wrote:
 We're comfortably running *three* production DSpace instances in a
 single Tomcat 6 with these limits:

 JAVA_OPTS=-Xmx1024M -Xms768M
 JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=128M
 JAVA_OPTS=$JAVA_OPTS -XX:PermSize=32M

 That's on a box with 3GB of physical memory. One DSpace instance is
 1.6, and the other two are 1.5.

 Now, I do have an old weekly reminder to check PermGen on that box,
 but it is always around half filled these days. We had problems in
 the past, but newer versions of DSpace seem to do much better in that
 regard. I can't recall the last time we had to restart that Tomcat
 just to clean up memory.

 We have a development box with maybe two dozen DSpace instances, none
 of them very busy at all, various versions and states of disrepair,
 and we do have to restart Tomcat there from time to time if we are
 doing a lot of webapp. reloading. The limits there are:

 JAVA_OPTS=-Xmx1024M -Xms128M
 JAVA_OPTS=$JAVA_OPTS -XX:PermSize=192M
 JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=384M

 on a 4GB machine.




 --

 Start uncovering the many advantages of virtual appliances
 and start using them to simplify application deployment and
 accelerate your shift to cloud computing.
 http://p.sf.net/sfu/novell-sfdev2dev



 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Anderson, Charles W
- Tim Donohue tdono...@duraspace.org wrote:

| Quick followup, in case it isn't clear (as I was asked about this 
| off-list).  The preference would be to share your DSpace 
| setup/configuration information directly on this listserv 


Let me kick things off, then (questions truncated a bit for formatting reasons):

1)  Contact Info
a)  Bill Anderson / Georgia Institute of Technology / 
bill.ander...@library.gatech.edu


2)  DSpace Setup and Configuration details

a)  What DSpace version are you using? 
   1.   Dspace 1.6.2
   2.   Currently using JSPUI, migrating to XMLUI
   3.   30,498 Items
   4.   610 Communities/Collections

b)  What Postgres/Oracle version are you using?
   1.   PostgreSQL  8.1.4

c)  What Tomcat version are you using? 
   1.   Tomcat/6.0.26 + mod_jk/1.2.30 + Apache/2.0.52

d)  Is everything running on one server (DSpace/Tomcat/Posgres/etc)? 
   1.   Everything is (currently) on the same server
   2.   PowerEdge 2850: 2x Intel Xeon CPU 2.80Ghz, 12Gb Memory, Red Hat AS 4 
(Nahant Update 8), RAID5 Disk array

e)  How much memory are you making available to Tomcat/Java?
   1.   (lb worker) JAVA_OPTS=-server -Xmx462M -Xms462M -XX:+UseParallelGC 
-Dfile.encoding=UTF-8, webapps: jspui  lni  oai  sword  xmlui
   2.   (lb worker) JAVA_OPTS=-server -Xmx462M -Xms462M -XX:+UseParallelGC 
-Dfile.encoding=UTF-8, webapps: jspui  lni  oai   sword  xmlui
   3.   JAVA_OPTS=-server -Xmx600M -Xms600M -XX:+UseParallelGC 
-Dfile.encoding=UTF-8, webapps: solr
   4.   lb worker method=request, socket_keepalive=True, socket_timeout=0, 
ping_mode=A
   5.   Postgres max_connections=300


3)  Performance / Scalability Issues noticed
   1. We've had intermittent performance problems since upgrading to 1.6 in 
May. At first, the problems seemed strictly SOLR-related; SOLR was grabbing 
hundreds of postgres connections, and eventually generating these in 
dspace.log: 

org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool 
error: Timeout waiting for idle object and these in catalina.out: SEVERE: 
org.apache.solr.common.SolrException: Error opening new searcher. exceeded 
limit of maxWarmingSearchers=2, try again later. ...followed by permgen errors 
and death.

   2.  We heavily revised our solrconfig.xml, and alleviated the problem, but 
didn't eliminate it.  We also split our jspui between two load-balanced tomcat 
instance, and moved the SOLR webapp to another third instance, which also 
helped.  Following OR 2010, on a suggesting from Peter Dietz, we revised the 
SOLR JSP code to use the auto-commit functionality rather than manually 
committing every transaction.  All of this got us to the point where we weren't 
crashing routinely; but we still have major problems during times of heavy 
traffic. Generally, these take the form of a gradual slowdown followed by a 
complete failure to respond; this sometimes ends in spontaneous recovery, and 
sometimes in permgen errors and a crash. At the end of last week, following a 
bad patch caused by a LOCKSS harvest, we implemented a restart schedule, with 
our two jspui tomcat instances being automatically restarted every 6 hours 
alternating between one/two.  We haven't had any crashes since; but we're not 
at all sure we've solved the problem.

   3.  On restart, we sometimes get a bunch of these: 

Sep 28, 2010 9:00:06 AM org.apache.catalina.loader.WebappClassLoader 
clearReferencesThreads SEVERE: A web application appears to have started a 
thread named [FinalizableReferenceQueue] but has failed to stop it. This is 
very likely to create a memory leak

   4.   Other errors that lead to a service/application outage: 

Sep 23, 2010 3:47:14 PM 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable run SEVERE: Caught 
exception (java.lang.OutOfMemoryError: PermGen space) executing 
org.apache.jk.common.channelsocket$socketconnect...@3aff776, terminating thread

Sep 23, 2010 10:37:04 AM org.apache.catalina.connector.CoyoteAdapter service
SEVERE: An exception or error occurred in the container during the request 
processing
java.lang.OutOfMemoryError: PermGen space
at java.lang.Throwable.getStackTraceElement(Native Method)
at java.lang.Throwable.getOurStackTrace(Throwable.java:591)
at java.lang.Throwable.getStackTrace(Throwable.java:582)
at org.apache.juli.logging.DirectJDKLog.log(DirectJDKLog.java:155)
at org.apache.juli.logging.DirectJDKLog.error(DirectJDKLog.java:135)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:274)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 

Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-29 Thread Mark H. Wood
I'd like to point out that the discussion is broadening considerably:
a system can be slow for many reasons, not just memory starvation.

Step 1: what resource(s) are you short of?  Something like LambdaProbe
can peek inside Tomcat and show you how much of each of the various
memory pools is being used.  OS tools can show whether you are
swapping heavily or spending a lot of time in I/O wait or are really
CPU-bound (and what, besides Tomcat, may be eating CPU).  DBMS tools
can reveal places in the schema that don't scale well, queries that
could be optimized, and additional indices that would be beneficial.

It would be really helpful for large, busy sites with performance
problems to share any such detailed observations.  Some of those
problems can probably be tuned away, and some will point to specific
things for coders to investigate.  Scaling experience will be valuable
both in documenting good ways to tune up for DSpace and in finding
design hotspots for rework.

-- 
Mark H. Wood, Lead System Programmer   mw...@iupui.edu
Balance your desire for bells and whistles with the reality that only a 
little more than 2 percent of world population has broadband.
-- Ledford and Tyler, _Google Analytics 2.0_


pgppBjWncgb5p.pgp
Description: PGP signature
--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-23 Thread Tom De Mulder
On 22 Sep 2010, at 20:22, Sands Alden Fish wrote:

 (2) We currently don't have a centralized server with enough test data
 to run many of these memory or scalability tests on our own.  I think
 this is something we could look into improving upon (especially if
 anyone has test data to donate to the cause).

There is a lot of public domain data available online. I spent some time 
collecting some of this in a variety of formats (text, images, movies, sound, 
datasets) and then wrote something to use a word list (e.g. /usr/share/dict on 
most Linux systems) to create random metadata for them. 

After all, it doesn't matter that many bitstreams will be identical.

That is how we populated our test environment here so we could replicate the 
problems we were seeing on the live system.


Best regards,

--
Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH


--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-22 Thread Claudia Jürgen
Hello Graham,

this is an important point. Apart from the issues mentioned, a simpler 
architecture will help DSpace adopt to new requirements/technology 
changes and stay flexible and easy to manage.

Furthermore too much clever tricks under the hood will raise the risk 
that with change in the committer team (people do change jobs, or just 
priorities change and so the commitment) important knowledge will not be 
available anymore and has to be regained at some cost.

Maybe we need the old arch board back or something similar.

Best put this on the list for the committer and all meetings or a 
special meeting. Needs a bit more space to talk about.

Have a sunny day

Claudia



Am 21.09.2010 13:52, schrieb Graham Triggs:
...
 I have repeatedly warned about the consequences of overly-complicated code
 and using 'clever tricks' under the hood. A lot of what I've mentioned above
 *can* be replaced with a much simpler architecture, that's much easier to
 understand, easier to maintain, and does not have the same problems.

 If this matters to you, then it's going to take more than just me to stand
 up and say this.

 G




 --
 Start uncovering the many advantages of virtual appliances
 and start using them to simplify application deployment and
 accelerate your shift to cloud computing.
 http://p.sf.net/sfu/novell-sfdev2dev



 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech

-- 
Claudia Juergen
Universitaetsbibliothek Dortmund
Eldorado
0231/755-4043
https://eldorado.tu-dortmund.de/

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-22 Thread TAYLOR Robin
Hi Graham, 

I don't have time at the moment to consider some of the bigger issues you raise 
but I would like to echo Hardy's comments. Historically many Dspace 
installations have had little content and been lightly used. I think this has 
allowed us to develop without much consideration for performance. I would like 
to see the sort of testing you have done becoming part of our procedures prior 
to release, rather than being left to the bigger sites, such as BioMed, to sort 
out after the event.

Cheers, Robin.


Robin Taylor
Main Library
University of Edinburgh
Tel. 0131 6513808  

 -Original Message-
 From: Pottinger, Hardy J. [mailto:pottinge...@umsystem.edu] 
 Sent: 21 September 2010 18:27
 To: Graham Triggs; Tom De Mulder
 Cc: dspace-tech@lists.sourceforge.net; Damian Marinaccio
 Subject: Re: [Dspace-tech] tomcat reporting memory leak?
 
 Hi, Graham, for what it's worth, I'll stand with you. :-) I 
 think addressing the issues you've discovered is really 
 important. Here's an idea: how about some new unit and/or 
 performance tests that check if a class and/or app is 
 unloading cleanly? In other words, would it be possible to 
 express the tests you have in such a way that they could be 
 part of the new testing framework? Are there JIRA issues, 
 and/or patches for what you have already found/fixed?
 
 --Hardy 
 
  -Original Message-
  From: Graham Triggs [mailto:grahamtri...@gmail.com]
  Sent: Tuesday, September 21, 2010 6:52 AM
  To: Tom De Mulder
  Cc: dspace-tech@lists.sourceforge.net; Damian Marinaccio
  Subject: Re: [Dspace-tech] tomcat reporting memory leak?
  
  On 20 September 2010 15:59, Tom De Mulder td...@cam.ac.uk wrote:
  
  
  On Mon, 20 Sep 2010, Damian Marinaccio wrote:
  
   I'm seeing the following log messages in catalina.out:
  
   [...]
  
   SEVERE: The web application [] appears to have 
 started a thread 
  named [FinalizableReferenceQueue] but has failed to stop it.
   This is very likely to create a memory leak.
  
  
  There are quite a few memory leaks in DSpace. We have a 
 cronjob to 
  restart
  Tomcat nightly, because otherwise it'll break the next day.
  
  
  
  
  Hi all,
  
  Oh, welcome to my world!!
  
  I'm going to start off by pointing out that the majority of DSpace 
  code is actually quite well behaved. Going back to the 
 codebase circa 
  1.4.2 / 1.5, and using the JSP user interface - I've got *thirty* 
  spearate DSpace repositories / applications running in a 
 single Tomcat 
  instance, which has operated without a restart in over 90 days. And 
  whilst be able to undeploy and redeploy any of those 
 applications at 
  will - or just reload them so that they pick up new configuration.
  
  That does require a bit of careful setup / teardown in the context 
  listeners (that wasn't always part of the DSpace code), and 
 you need 
  to get certain JARs - particularly the database/pooling 
 drivers - out 
  of the web applications entirely and into the shared level 
 of Tomcat. 
  Most of that is actually just good / recommended practise 
 for systems 
  administration of a Java application server anyway.
  
  I was careful to point out that I have achieved that with 
 pre-1.6 code 
  and JSP only. Both 1.6 and XML ui (of any age) change the 
 landscape. 
  XML ui has always taken a large chunk of resources, 
 although whilst it 
  was still based on Cocoon 2.1, I managed to at least clean up it's 
  startup / shutdown behaviour by repairing it's logging 
 handler. This 
  behaviour has changed with Cocoon 2.2, and I'll come back 
 to that shortly.
  
  So, 1.6 - I've been doing some work on the resource usage and clean 
  loading/unloading of both JSP and XML using 1.6.2 recently, and 
  neither are clean out of the box.
  
  The first issue you run into is the 
 FinalizableReferenceQueue noted in 
  the stack trace above. This is coming from a reference map in 
  reflectutils - and was found to be a cleanup problem in course of 
  DSpace
  2 development (the kernel / services framework was backported from 
  that work). I added a LifecycleManager to reflectutils that was 
  released as version 0.9.11 that allows the internal 
 structures to be 
  shutdown cleanly, and implemented this as part of DSpace 2, however 
  this appears to have been ignored in the backport.
  
  So, with the reflectutils/Lifecycle changes, and careful 
 placement of 
  JARs, etc. I did get the JSP ui to unload cleanly last 
 week. I would 
  note that I didn't stress the application too heavily, so 
 there may be 
  some operations that might trigger different code paths 
 that are still 
  a problem, but at the baseline it was working correctly.
  
  XML ui has proven to be a somewhat more challenging beast. 
 I first ran 
  into two problems that are inside Cocoon 2.2 itself - 1) in the 
  sitemap processing, it's using a stack inside a ThreadLocal, but it 
  never removes the stack when it empties it, and 2) in one class

Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-22 Thread Tom De Mulder

I am very happy to see that this issue seems finally to be taken seriously. 
However, I find myself getting a bit frustrated that it was never taken 
seriously when I raised it in the past.

I think the DSpace source code carries with it a lot of historical baggage, and 
it could do with being addressed even without making fundamental changes to the 
basic architecture. Although my personal favourite would be a completely new 
architecture with more loosely coupled modules, but fixing memory leaks and the 
associated slow performance would be a good start.

I can add that, for example, deleting a collection with 1200 items on our 
rather powerful DSpace machines will take two hours, and uses most of the 
available memory. You can see why I would like that no longer to be the case.


Best regards,

--
Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH


--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-22 Thread Tim Donohue
Hi all,

I'm sorry if any of you have felt that this issue is not being taken 
seriously in the past.  The reality of the situation is that we (the 
DSpace Developers/Committers) currently depend on feedback/testing from 
larger DSpace instances around these sorts of scalability and memory 
issues.  As DSpace is Community Built  Supported Software, there are a 
couple things to keep in mind:

(1) DSpace software has zero full-time developers.  All Committers are 
volunteers and can only devote as much time as their individual 
institutions allow.  Although I officially have DSpace in my title, I 
also wear several hats in DuraSpace. Therefore, even I don't have much 
time in a given week to devote towards actual DSpace development work.

(2) We currently don't have a centralized server with enough test data 
to run many of these memory or scalability tests on our own.  I think 
this is something we could look into improving upon (especially if 
anyone has test data to donate to the cause).  I agree with Robin T. 
that it is in everyone's interest to improve our performance testing 
prior to each release.  I'd also encourage Graham (and others) to share 
their testing routes so that we can work to make this happen, and start 
to locate these performance issues *before* new releases, rather than after.

I'm also very happy to see these issues starting to gain some leverage. 
  The reality of the situation is that we need one or more volunteers to 
step up and help to make these improvements or suggest testing routes 
that can allow us to better investigate where memory leaks may be 
occurring (or point them out if you've already found where the leaks 
are).  All of us want DSpace to scale well and avoid memory leaks -- if 
it takes a new architecture to do so that is one possible route forward. 
  But, the main thing to keep in mind is that DSpace is built  
maintained by volunteer developers -- so, we need to find the volunteers 
(and convince their institutions) to help make this happen.

It sounds like we've already located a few interested parties in this 
discussion.  So, I hope that we can move forward with this work soon and 
perhaps even make some quick improvements in time for the rapidly 
approaching 1.7.0 release.

If you'd like to volunteer to help us out, please let us know how you'd 
like to help!

- Tim

-- 
Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org


On 9/22/2010 10:33 AM, Tom De Mulder wrote:

 I am very happy to see that this issue seems finally to be taken seriously. 
 However, I find myself getting a bit frustrated that it was never taken 
 seriously when I raised it in the past.

 I think the DSpace source code carries with it a lot of historical baggage, 
 and it could do with being addressed even without making fundamental changes 
 to the basic architecture. Although my personal favourite would be a 
 completely new architecture with more loosely coupled modules, but fixing 
 memory leaks and the associated slow performance would be a good start.

 I can add that, for example, deleting a collection with 1200 items on our 
 rather powerful DSpace machines will take two hours, and uses most of the 
 available memory. You can see why I would like that no longer to be the case.


 Best regards,

 --
 Tom De Muldertd...@cam.ac.uk  - Cambridge University Computing Service
 +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH


 --
 Start uncovering the many advantages of virtual appliances
 and start using them to simplify application deployment and
 accelerate your shift to cloud computing.
 http://p.sf.net/sfu/novell-sfdev2dev
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-22 Thread Sands Alden Fish



On Sep 22, 2010, at 12:10 PM, Tim Donohue wrote:

(2) We currently don't have a centralized server with enough test data
to run many of these memory or scalability tests on our own.  I think
this is something we could look into improving upon (especially if
anyone has test data to donate to the cause).

There's a lot of Creative Commons licensed content in the DSpace-sphere.  
Perhaps an effort to gather what various sites are willing to donate into a 
DuraSpace repository would give us the amount of data we need, as well as 
beneficial heterogeneity in said data?  Perhaps beyond this (and certainly 
there would be other considerations here) it could be set up in such a way that 
the data could be (extremely) easily replicated into one's test environment to 
put an instance through its paces?

I agree with Robin T.
that it is in everyone's interest to improve our performance testing
prior to each release.  I'd also encourage Graham (and others) to share
their testing routes so that we can work to make this happen, and start
to locate these performance issues *before* new releases, rather than after.

Pursuant of a first step in this direction (and one that would help me 
personally), I'd like to ask if anyone out there has an Apache JMeter test plan 
file that is/could be generalized for use stressing any DSpace application.  I 
know that each instance has its own customizations, URL patterns, areas to 
stress, etc. but there is a lot that could be covered generally for any 
implementation.  Does this exist out there?  I have always just cobbled 
together a very simplistic setup that hits the front page, community-list, some 
particular items and URLs.  Perhaps we can collaboratively build one out with 
everyone's input.



--
sands fish
Software Engineer
MIT Libraries
Technology Research  Development
sa...@mit.edumailto:sa...@mit.edu
E25-131

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-22 Thread Mark H. Wood
A random collection of thoughts which occurred while reading this
thread:

o  Performance, scalability, complexity, and ruggedness are sometimes
   competing influences on the design of code.  We can improve in all
   of these aspects.  Sometimes all of those influences will conspire
   to suggest a particular design, and at other times we will have to
   trade them off against one another.  And performance, in
   particular, is tricky to characterize, because a design that
   performs best at small scale may be worst at large scale or vice
   versa.

   What I think I am getting at here is that we want many different
   kinds of goodness and we need to pursue them together if we want to
   achieve any of them in a meaningful way.

o  The testing work has also introduced some new automated reports
   that we should be reviewing.  Have you seen how many FIXMEs there
   are, and what they are saying?  Quite motivational.  The Findbugs
   report is also interesting in spots.

o  Where it seems that code must be complex, thorough documentation of
   the thought behind it will not only capture important knowledge for
   the next person who has to work there, but can also provide
   opportunities to realize:  good heavens, did I really write that?
   there must be a better way  When I find myself writing
   absurd comments, it is usually because I have been writing (or was
   about to write) absurd code.

o  Best practice and commonest practice w.r.t. deployment of libraries
   seem to be antithetical in the Java universe.  I was quite pleased
   to discover that I'm not the only one who thinks that Tomcat's /lib
   directory is on the app. classpath for good reasons.

o  The DSpace 2 architecture (which we are approaching by easy stages)
   attempts to address looser coupling and similar OO goals.

-- 
Mark H. Wood, Lead System Programmer   mw...@iupui.edu
Balance your desire for bells and whistles with the reality that only a 
little more than 2 percent of world population has broadband.
-- Ledford and Tyler, _Google Analytics 2.0_


pgp29I70MjuYL.pgp
Description: PGP signature
--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-22 Thread Mark H. Wood
And one point I forgot:

o  Volunteers dont' have to write code.  If you aren't quite ready to
   step into the DSpace tarball with torch and machete, but can read
   Java, you can review the code and make suggestions.  Many eyes
   make all bugs shallow.

   Bug reports (including performance problems) are always useful.

   And just asking, why is this so slow? can help to focus attention on
   design decisions which perhaps didn't get quite as much attention
   as they deserved.  Keep asking until you get a sensible answer.

-- 
Mark H. Wood, Lead System Programmer   mw...@iupui.edu
Balance your desire for bells and whistles with the reality that only a 
little more than 2 percent of world population has broadband.
-- Ledford and Tyler, _Google Analytics 2.0_


pgpabL9aEuZNn.pgp
Description: PGP signature
--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-22 Thread Flavio Botelho
On Wed, Sep 22, 2010 at 4:51 PM, Mark H. Wood mw...@iupui.edu wrote:

 o  Best practice and commonest practice w.r.t. deployment of libraries
   seem to be antithetical in the Java universe.  I was quite pleased
   to discover that I'm not the only one who thinks that Tomcat's /lib
   directory is on the app. classpath for good reasons.

Actually nowadays AFAIK that is universally accepted as bad practice.
It's not a coincidence Tomcat removed /common/lib in version 6.

During the start of web development with Java, what you are defending
become an obvious choice to avoid wasting resources. But the problem
is that you need to adapt code in all the applications in the
container at the same time to be able to move library versions. Which
is really difficult for company's internal code, and impossible if
there is any third-party or open-source code...
Unless you want to run one Tomcat instance per application... Which
bring us to the question what difference would make have the libs in
Tomcat's lib in this scenario?



 o  The DSpace 2 architecture (which we are approaching by easy stages)
   attempts to address looser coupling and similar OO goals.

 --
 Mark H. Wood, Lead System Programmer   mw...@iupui.edu
 Balance your desire for bells and whistles with the reality that only a
 little more than 2 percent of world population has broadband.
        -- Ledford and Tyler, _Google Analytics 2.0_

 --
 Start uncovering the many advantages of virtual appliances
 and start using them to simplify application deployment and
 accelerate your shift to cloud computing.
 http://p.sf.net/sfu/novell-sfdev2dev
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech



--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-21 Thread Graham Triggs
On 20 September 2010 15:59, Tom De Mulder td...@cam.ac.uk wrote:

 On Mon, 20 Sep 2010, Damian Marinaccio wrote:

  I'm seeing the following log messages in catalina.out:
  [...]
  SEVERE: The web application [] appears to have started a thread named
 [FinalizableReferenceQueue] but has failed to stop it.
  This is very likely to create a memory leak.

 There are quite a few memory leaks in DSpace. We have a cronjob to restart
 Tomcat nightly, because otherwise it'll break the next day.



Hi all,

Oh, welcome to my world!!

I'm going to start off by pointing out that the majority of DSpace code is
actually quite well behaved. Going back to the codebase circa 1.4.2 / 1.5,
and using the JSP user interface - I've got *thirty* spearate DSpace
repositories / applications running in a single Tomcat instance, which has
operated without a restart in over 90 days. And whilst be able to undeploy
and redeploy any of those applications at will - or just reload them so that
they pick up new configuration.

That does require a bit of careful setup / teardown in the context listeners
(that wasn't always part of the DSpace code), and you need to get certain
JARs - particularly the database/pooling drivers - out of the web
applications entirely and into the shared level of Tomcat. Most of that is
actually just good / recommended practise for systems administration of a
Java application server anyway.

I was careful to point out that I have achieved that with pre-1.6 code and
JSP only. Both 1.6 and XML ui (of any age) change the landscape. XML ui has
always taken a large chunk of resources, although whilst it was still based
on Cocoon 2.1, I managed to at least clean up it's startup / shutdown
behaviour by repairing it's logging handler. This behaviour has changed with
Cocoon 2.2, and I'll come back to that shortly.

So, 1.6 - I've been doing some work on the resource usage and clean
loading/unloading of both JSP and XML using 1.6.2 recently, and neither are
clean out of the box.

The first issue you run into is the FinalizableReferenceQueue noted in the
stack trace above. This is coming from a reference map in reflectutils - and
was found to be a cleanup problem in course of DSpace 2 development (the
kernel / services framework was backported from that work). I added a
LifecycleManager to reflectutils that was released as version 0.9.11 that
allows the internal structures to be shutdown cleanly, and implemented this
as part of DSpace 2, however this appears to have been ignored in the
backport.

So, with the reflectutils/Lifecycle changes, and careful placement of JARs,
etc. I did get the JSP ui to unload cleanly last week. I would note that I
didn't stress the application too heavily, so there may be some operations
that might trigger different code paths that are still a problem, but at the
baseline it was working correctly.

XML ui has proven to be a somewhat more challenging beast. I first ran into
two problems that are inside Cocoon 2.2 itself - 1) in the sitemap
processing, it's using a stack inside a ThreadLocal, but it never removes
the stack when it empties it, and 2) in one class relating to flowscript
handling, it does not clean up the Mozilla Rhino engine correctly when it's
finished using it (curiously, it's used in a number of places, and
everywhere else it appears to be structured correctly to clean up - just
this one class is screwed up).

With locally patched versions of the sitemap and flowscript JARs from Cocoon
(the ThreadLocal patch isn't really guaranteed to not leak in unexpected
circumstances - but it was sufficient to remove the problem in the scope of
this testing. Basically, ThreadLocal is really dangerous to use), I then ran
into another issue, this time with the CachingService that was backported.

With XML ui, it's using the RequestScope function of the caching service (it
didn't appear to be exercising this part with JSP - that may just be because
I only ran through limited code paths). For the RequestScope, it's tying the
cache not to the request object... but to a ThreadLocal. And that
ThreadLocal isn't being cleaned up at the end of the request. (The shutdown
code is also incapable of doing the job it's intended for, as it will only
ever execute on a single thread, and not see all the other threads that may
have processed requests).

There is a high probability of this leaking memory all over the place, and
there is also the nasty potential of leak information across requests that
is undesirable.

I made another hacked version that removes the ThreadLocal, but replicates a
lot of it's thread affinity behaviour (so, it still has the nasty side
effects of the implementation, but at least removed the hold the system had
over the application resources). XML ui was *still* not unloading correctly,
and at this point the profiler stopped giving me pointers to strong
references that were being held. So right now I'm not sure what else is up -
but there is at least one more troubling part of the 

Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-21 Thread Pottinger, Hardy J.
Hi, Graham, for what it's worth, I'll stand with you. :-) I think addressing 
the issues you've discovered is really important. Here's an idea: how about 
some new unit and/or performance tests that check if a class and/or app is 
unloading cleanly? In other words, would it be possible to express the tests 
you have in such a way that they could be part of the new testing framework? 
Are there JIRA issues, and/or patches for what you have already found/fixed?

--Hardy 

 -Original Message-
 From: Graham Triggs [mailto:grahamtri...@gmail.com]
 Sent: Tuesday, September 21, 2010 6:52 AM
 To: Tom De Mulder
 Cc: dspace-tech@lists.sourceforge.net; Damian Marinaccio
 Subject: Re: [Dspace-tech] tomcat reporting memory leak?
 
 On 20 September 2010 15:59, Tom De Mulder td...@cam.ac.uk wrote:
 
 
   On Mon, 20 Sep 2010, Damian Marinaccio wrote:
 
I'm seeing the following log messages in catalina.out:
 
[...]
 
SEVERE: The web application [] appears to have started a thread
 named [FinalizableReferenceQueue] but has failed to stop it.
This is very likely to create a memory leak.
 
 
   There are quite a few memory leaks in DSpace. We have a cronjob to
 restart
   Tomcat nightly, because otherwise it'll break the next day.
 
 
 
 
 Hi all,
 
 Oh, welcome to my world!!
 
 I'm going to start off by pointing out that the majority of DSpace code
 is actually quite well behaved. Going back to the codebase circa 1.4.2 /
 1.5, and using the JSP user interface - I've got *thirty* spearate
 DSpace repositories / applications running in a single Tomcat instance,
 which has operated without a restart in over 90 days. And whilst be able
 to undeploy and redeploy any of those applications at will - or just
 reload them so that they pick up new configuration.
 
 That does require a bit of careful setup / teardown in the context
 listeners (that wasn't always part of the DSpace code), and you need to
 get certain JARs - particularly the database/pooling drivers - out of
 the web applications entirely and into the shared level of Tomcat. Most
 of that is actually just good / recommended practise for systems
 administration of a Java application server anyway.
 
 I was careful to point out that I have achieved that with pre-1.6 code
 and JSP only. Both 1.6 and XML ui (of any age) change the landscape. XML
 ui has always taken a large chunk of resources, although whilst it was
 still based on Cocoon 2.1, I managed to at least clean up it's startup /
 shutdown behaviour by repairing it's logging handler. This behaviour has
 changed with Cocoon 2.2, and I'll come back to that shortly.
 
 So, 1.6 - I've been doing some work on the resource usage and clean
 loading/unloading of both JSP and XML using 1.6.2 recently, and neither
 are clean out of the box.
 
 The first issue you run into is the FinalizableReferenceQueue noted in
 the stack trace above. This is coming from a reference map in
 reflectutils - and was found to be a cleanup problem in course of DSpace
 2 development (the kernel / services framework was backported from that
 work). I added a LifecycleManager to reflectutils that was released as
 version 0.9.11 that allows the internal structures to be shutdown
 cleanly, and implemented this as part of DSpace 2, however this appears
 to have been ignored in the backport.
 
 So, with the reflectutils/Lifecycle changes, and careful placement of
 JARs, etc. I did get the JSP ui to unload cleanly last week. I would
 note that I didn't stress the application too heavily, so there may be
 some operations that might trigger different code paths that are still a
 problem, but at the baseline it was working correctly.
 
 XML ui has proven to be a somewhat more challenging beast. I first ran
 into two problems that are inside Cocoon 2.2 itself - 1) in the sitemap
 processing, it's using a stack inside a ThreadLocal, but it never
 removes the stack when it empties it, and 2) in one class relating to
 flowscript handling, it does not clean up the Mozilla Rhino engine
 correctly when it's finished using it (curiously, it's used in a number
 of places, and everywhere else it appears to be structured correctly to
 clean up - just this one class is screwed up).
 
 With locally patched versions of the sitemap and flowscript JARs from
 Cocoon (the ThreadLocal patch isn't really guaranteed to not leak in
 unexpected circumstances - but it was sufficient to remove the problem
 in the scope of this testing. Basically, ThreadLocal is really dangerous
 to use), I then ran into another issue, this time with the
 CachingService that was backported.
 
 With XML ui, it's using the RequestScope function of the caching service
 (it didn't appear to be exercising this part with JSP - that may just be
 because I only ran through limited code paths). For the RequestScope,
 it's tying the cache not to the request object... but to a ThreadLocal.
 And that ThreadLocal isn't being cleaned up at the end

[Dspace-tech] tomcat reporting memory leak?

2010-09-20 Thread Damian Marinaccio
I'm seeing the following log messages in catalina.out:

INFO: Deploying web application directory ROOT
Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader 
clearReferencesThreads
SEVERE: The web application [] appears to have started a thread named 
[FinalizableReferenceQueue] but has failed to stop it.
 This is very likely to create a memory leak.
Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader 
clearReferencesThreads
SEVERE: The web application [] appears to have started a thread named 
[MultiThreadedHttpConnectionManager cleanup] but has f
ailed to stop it. This is very likely to create a memory leak.
Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader 
clearThreadLocalMap
SEVERE: The web application [] created a ThreadLocal with key of type 
[org.dspace.services.caching.ThreadLocalMap] (value [o
rg.dspace.services.caching.threadlocal...@d32560]) and a value of type 
[java.util.HashMap] (value [{}]) but failed to remove
 it when the web application was stopped. This is very likely to create a 
memory leak.
Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader 
clearThreadLocalMap
SEVERE: The web application [] created a ThreadLocal with key of type 
[java.lang.ThreadLocal] (value [java.lang.ThreadLocal@
3adaaa]) and a value of type 
[org.apache.cocoon.environment.internal.EnvironmentStack] (value [[]]) but 
failed to remove it
when the web application was stopped. This is very likely to create a memory 
leak.
Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader 
clearThreadLocalMap
SEVERE: The web application [] created a ThreadLocal with key of type 
[java.lang.ThreadLocal] (value [java.lang.ThreadLocal@
1ea0b8a]) and a value of type [org.apache.xerces.parsers.SAXParser] (value 
[org.apache.xerces.parsers.saxpar...@bfa709]) but
 failed to remove it when the web application was stopped. This is very likely 
to create a memory leak.
Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader 
clearThreadLocalMap
SEVERE: The web application [] created a ThreadLocal with key of type 
[java.lang.ThreadLocal] (value [java.lang.ThreadLocal@
9b9a36]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value 
[org.apache.lucene.index.segmentterme...@1a95d
e6]) but failed to remove it when the web application was stopped. This is very 
likely to create a memory leak.
Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader 
clearThreadLocalMap
SEVERE: The web application [] created a ThreadLocal with key of type 
[java.lang.ThreadLocal] (value [java.lang.ThreadLocal@
3adaaa]) and a value of type 
[org.apache.cocoon.environment.internal.EnvironmentStack] (value [[]]) but 
failed to remove it
when the web application was stopped. This is very likely to create a memory 
leak.
Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader 
clearThreadLocalMap
SEVERE: The web application [] created a ThreadLocal with key of type 
[java.lang.ThreadLocal] (value [java.lang.ThreadLocal@
53bd6e]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value 
[org.apache.lucene.index.segmentterme...@1b9c0
86]) but failed to remove it when the web application was stopped. This is very 
likely to create a memory leak.
Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader 
clearThreadLocalMap
SEVERE: The web application [] created a ThreadLocal with key of type 
[java.lang.ThreadLocal] (value [java.lang.ThreadLocal@
ecd7c]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value 
[org.apache.lucene.index.segmentterme...@1d4afe
d]) but failed to remove it when the web application was stopped. This is very 
likely to create a memory leak.
Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader 
clearThreadLocalMap
SEVERE: The web application [] created a ThreadLocal with key of type 
[java.lang.ThreadLocal] (value [java.lang.ThreadLocal@
6a081c]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value 
[org.apache.lucene.index.segmentterme...@13a9a
cb]) but failed to remove it when the web application was stopped. This is very 
likely to create a memory leak.
Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader 
clearThreadLocalMap
SEVERE: The web application [] created a ThreadLocal with key of type 
[java.lang.ThreadLocal] (value [java.lang.ThreadLocal@
88a3ce]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value 
[org.apache.lucene.index.segmentterme...@ba2bb
5]) but failed to remove it when the web application was stopped. This is very 
likely to create a memory leak.
Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader 
clearThreadLocalMap
SEVERE: The web application [] created a ThreadLocal with key of type 
[java.lang.ThreadLocal] (value [java.lang.ThreadLocal@
1ea0b8a]) and a value of type [org.apache.xerces.parsers.SAXParser] (value 
[org.apache.xerces.parsers.saxpar...@b481ba]) but
 

Re: [Dspace-tech] tomcat reporting memory leak?

2010-09-20 Thread Tom De Mulder
On Mon, 20 Sep 2010, Damian Marinaccio wrote:

 I'm seeing the following log messages in catalina.out:
 [...]
 SEVERE: The web application [] appears to have started a thread named 
 [FinalizableReferenceQueue] but has failed to stop it.
 This is very likely to create a memory leak.

There are quite a few memory leaks in DSpace. We have a cronjob to restart 
Tomcat nightly, because otherwise it'll break the next day.


Best,

--
Tom De Mulder td...@cam.ac.uk - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
- 20/09/2010 : The Moon is Waxing Gibbous (80% of Full)

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech