Re: [Dspace-tech] DSpace a memory hog?
Hi Rob, On Thu, Apr 19, 2007 at 08:27:32PM -0400, Robert Tansley wrote: batch export (classic): needs fixing batch import (classic): needs fixing browse indexer: needs fixing search (lucene indexer): needs fixing media filter: OK history system: problems recording collection state (loads all items into memory) Sitemap generator: OK checksum checker: fine but only because it has its own DB access routines and doesn't use the APIs (!) The new-style packager (with plug-ins) only appears to be able to operate on one Item at a time. The above could probably be fixed for 1.4.2, with the potential exception of the checksum checker which needs to be changed to use the correct APIs. I think these are a bit late for 1.4.2. I was hoping to get a beta out on Monday, and make the full release a week after. If we have something solid to aim for though, I don't see why we can't start work on 1.4.3 immediately (we can just keep committing to the branch after all). Jim -- James Rutherford | Hewlett-Packard Limited registered Office: Research Engineer | Cain Road, HP Labs | Bracknell, Bristol, UK | Berks +44 117 312 7066 | RG12 1HN. [EMAIL PROTECTED] | Registered No: 690597 England The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as HP CONFIDENTIAL. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace a memory hog?
Cory, Comments below: On 04/18/2007 01:54 PM, Cory Snavely wrote: Well, as I said at first, it all depends on your definition of what a memory hog is. Today's hog fits in tomorrow's pocket. We better all already be used to that. Thank you for proving my point on memory bloat pervasiveness in the IT industry. This type of thinking allows vendors (whether open source or proprietary) to drive up the base systems requirements without greatly improving functionality because it is predestined. Also, I don't think for a *minute* that the original developers of DSpace made a casual choice about their development environment--in fact, I think they made a responsible choice given the alternatives. Let's give our colleagues credit that's due. Their choice permits scaling and fits well for an open-source project. Putting the general problem of memory bloat in their laps seems pretty angsty to me. Lastly, dedicating a server to DSpace is a choice, not a necessity. We as implementors have complete freedom to separate out the database and storage tiers, and mechanisms exist for scaling Tomcat horizontally as well. In the other direction, I suspect people are running DSpace on VMware or xen virtual machines, too. I didn't say they made a casual choice about their development environment. I said the functional requirements of the application didn't justify the memory footprint required to run this application. Whether or not they made a choice that fits well for an open-source project depends on your definition of Open Source. However, I don't think that debate is relevant to this discussion. As far as scaling requirements, it depends on where you want scalability. As you pointed out, there is a natural ability with web applications to scale them vertically through hardware or Tomcat's, now native, horizontal approach. Since either approach needs hardware, the memory footprint of an application needs to be taken into account. The higher the base system requirements, the likelihood of someone having a scalable system is lowered due to total cost of ownership (TCO). While virtual machine technology can help lower some TCO issues, it brings in a whole new batch of problems which are out of scope for this discussion. The general problem of memory bloat rests in all developers laps (mine included). As an industry, we need to constantly weigh our use of memory against the functionality we are providing. The functionality provided by Dspace isn't rocket science, and shouldn't require memory footprints greater than most of systems that get people into space. -- Brad TealeWeb Application Developer Digital Library Development Lab University of Minnesota Libraries [EMAIL PROTECTED] On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote: Pan, Dspace is a memory hog considering the functionality the application provides. This is mainly due to the technological choices made by the founders of the Dspace project, and not the functional requirements the Dspace project fulfills. Application and memory bloat are pervasive in the IT industry. Each individual organization should look at their requirements whether they are hardware, software or both. Having to dedicate a machine to an application, especially a relatively simple application like Dspace, is wasteful for hardware resources and people resources. Web applications should _not_ need 2G of memory to run comfortably. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace a memory hog?
Generally what's going on is that Tomcat, the web application framework, has a large virtual machine running with a substantial amount of memory allocated to the caching of programs and data for performance. Depending on your database configuration, there can also be a substantial amount of allocation to cache in Postgres too. The indexer is a periodic process that does not run constantly. You still must account for the amount of memory it consumes while running. Memory requirements for recent versions of the indexing routine are of constant order, meaning they do not vary appreciably with repository size. On Wed, 2007-04-18 at 18:09 -0700, Pan Family wrote: Thank you all for giving your opinion! Technically, is it the web application or the indexer that requires most of the memory? What data is kept in memory all the time (even when nobody is searching)? Is the memory usage proportional to the number of concurrent sessions? Thanks again, Pan On 4/18/07, Cory Snavely [EMAIL PROTECTED] wrote: Well, as I said at first, it all depends on your definition of what a memory hog is. Today's hog fits in tomorrow's pocket. We better all already be used to that. Also, I don't think for a *minute* that the original developers of DSpace made a casual choice about their development environment--in fact, I think they made a responsible choice given the alternatives. Let's give our colleagues credit that's due. Their choice permits scaling and fits well for an open-source project. Putting the general problem of memory bloat in their laps seems pretty angsty to me. Lastly, dedicating a server to DSpace is a choice, not a necessity. We as implementors have complete freedom to separate out the database and storage tiers, and mechanisms exist for scaling Tomcat horizontally as well. In the other direction, I suspect people are running DSpace on VMware or xen virtual machines, too. Cory Snavely University of Michigan Library IT Core Services On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote: Pan, Dspace is a memory hog considering the functionality the application provides. This is mainly due to the technological choices made by the founders of the Dspace project, and not the functional requirements the Dspace project fulfills. Application and memory bloat are pervasive in the IT industry. Each individual organization should look at their requirements whether they are hardware, software or both. Having to dedicate a machine to an application, especially a relatively simple application like Dspace, is wasteful for hardware resources and people resources. Web applications should _not_ need 2G of memory to run comfortably. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace a memory hog?
Hi Pan, The Web server aspect (i.e. Tomcat) should have fairly constant memory use -- the vast majority of operations are very short and work on a very small number of objects, and as soon as the request is over any memory used is returned to the heap. How much memory you need to give it largely depends on the load, i.e. how many of these the server will be servicing at a given instant. The areas I think folks have run into memory use issues are batch importing, indexing and the media filters (thumbnail generation, text extraction for indexing) -- these operate on a large number of objects at once, and some of the DSpace code isn't so great at freeing up objects in these operations. But we're finding the problems and fixing them as Cory mentions. Getting technical below: Developers: a quick scan of the code shows that: batch export (classic): needs fixing batch import (classic): needs fixing browse indexer: needs fixing search (lucene indexer): needs fixing media filter: OK history system: problems recording collection state (loads all items into memory) Sitemap generator: OK checksum checker: fine but only because it has its own DB access routines and doesn't use the APIs (!) The new-style packager (with plug-ins) only appears to be able to operate on one Item at a time. Also found: BitstreamStorageManager appears to reach up into busines logic layer and user checker API () this needs fixing. This is probably because the checksum checker includes its own DB access API :-O The above could probably be fixed for 1.4.2, with the potential exception of the checksum checker which needs to be changed to use the correct APIs. Rob On 18/04/07, Pan Family [EMAIL PROTECTED] wrote: Thank you all for giving your opinion! Technically, is it the web application or the indexer that requires most of the memory? What data is kept in memory all the time (even when nobody is searching)? Is the memory usage proportional to the number of concurrent sessions? Thanks again, Pan On 4/18/07, Cory Snavely [EMAIL PROTECTED] wrote: Well, as I said at first, it all depends on your definition of what a memory hog is. Today's hog fits in tomorrow's pocket. We better all already be used to that. Also, I don't think for a *minute* that the original developers of DSpace made a casual choice about their development environment--in fact, I think they made a responsible choice given the alternatives. Let's give our colleagues credit that's due. Their choice permits scaling and fits well for an open-source project. Putting the general problem of memory bloat in their laps seems pretty angsty to me. Lastly, dedicating a server to DSpace is a choice, not a necessity. We as implementors have complete freedom to separate out the database and storage tiers, and mechanisms exist for scaling Tomcat horizontally as well. In the other direction, I suspect people are running DSpace on VMware or xen virtual machines, too. Cory Snavely University of Michigan Library IT Core Services On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote: Pan, Dspace is a memory hog considering the functionality the application provides. This is mainly due to the technological choices made by the founders of the Dspace project, and not the functional requirements the Dspace project fulfills. Application and memory bloat are pervasive in the IT industry. Each individual organization should look at their requirements whether they are hardware, software or both. Having to dedicate a machine to an application, especially a relatively simple application like Dspace, is wasteful for hardware resources and people resources. Web applications should _not_ need 2G of memory to run comfortably. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] DSpace a memory hog?
Hi, There is a rumor that says DSpace is a memory hog. I don't know where this is from but it may not be that important. What is important is that it makes my management nerves. So I'd like to hear from those who know anything about this issue. Is it really a memory hog? Under what circumstances it might become a memory hog? Or there should be no worry about memory usage at all? Thanks a lot in advance! -Pan - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace a memory hog?
This depends on your definition of a memory hog. We run a relatively large instance of DSpace and we allocate 512MB to Tomcat, about 100MB to Postgres, and 256MB for daily indexing runs (via the dsrun script). In earlier versions of DSpace the indexing routine needed to be patched to work around a poor implementation that caused memory allocation to be linear with repository size. Without that, we were running out of memory during indexing. I believe that patch is now part of the base. We run comfortably inside 2G of physical memory. I may have considered that a memory hog 5 years ago, but today I consider it light. Cory Snavely University of Michigan Library IT Core Services On Wed, 2007-04-18 at 01:01 -0700, Pan Family wrote: Hi, There is a rumor that says DSpace is a memory hog. I don't know where this is from but it may not be that important. What is important is that it makes my management nerves. So I'd like to hear from those who know anything about this issue. Is it really a memory hog? Under what circumstances it might become a memory hog? Or there should be no worry about memory usage at all? Thanks a lot in advance! -Pan - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace a memory hog?
Pan In comparison to applications which run on inferior Operating Systems such as Microsoft (2000 - vista) this is not abnormal memory usage. Generally I dedicate a server to dspace or any of the archival software which I use. Such rumors are either started by people with insufficient technical knowledge or the purveyors of proprietary software who are trying to scare people away from adopting software which they cannot extort exhorbitatnt fees from. Gerry Gerry Arthus Systems Administrator: Long Island Library Resources Council SUNY at Stony Brook Stony Brook, New York US 11794-3399 Phone: 1-631-632-6652 FAX: 631-632-6662 Home: 631-289-7565 Email: [EMAIL PROTECTED] Professor: Departments of: Graduate Computer Engineering, Earth and Environmental Science, and Engineering Management C.W. Post Campus of Long Island University 720 Northern Boulevard Brookville, New York US 11548-1300 Phone: 516-299-2293 Hi, There is a rumor that says DSpace is a memory hog. I don't know where this is from but it may not be that important. What is important is that it makes my management nerves. So I'd like to hear from those who know anything about this issue. Is it really a memory hog? Under what circumstances it might become a memory hog? Or there should be no worry about memory usage at all? Thanks a lot in advance! -Pan - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace a memory hog?
Pan, Dspace is a memory hog considering the functionality the application provides. This is mainly due to the technological choices made by the founders of the Dspace project, and not the functional requirements the Dspace project fulfills. Application and memory bloat are pervasive in the IT industry. Each individual organization should look at their requirements whether they are hardware, software or both. Having to dedicate a machine to an application, especially a relatively simple application like Dspace, is wasteful for hardware resources and people resources. Web applications should _not_ need 2G of memory to run comfortably. -- Brad TealeWeb Application Developer Digital Library Development Lab University of Minnesota Libraries [EMAIL PROTECTED] - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace a memory hog?
Well, as I said at first, it all depends on your definition of what a memory hog is. Today's hog fits in tomorrow's pocket. We better all already be used to that. Also, I don't think for a *minute* that the original developers of DSpace made a casual choice about their development environment--in fact, I think they made a responsible choice given the alternatives. Let's give our colleagues credit that's due. Their choice permits scaling and fits well for an open-source project. Putting the general problem of memory bloat in their laps seems pretty angsty to me. Lastly, dedicating a server to DSpace is a choice, not a necessity. We as implementors have complete freedom to separate out the database and storage tiers, and mechanisms exist for scaling Tomcat horizontally as well. In the other direction, I suspect people are running DSpace on VMware or xen virtual machines, too. Cory Snavely University of Michigan Library IT Core Services On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote: Pan, Dspace is a memory hog considering the functionality the application provides. This is mainly due to the technological choices made by the founders of the Dspace project, and not the functional requirements the Dspace project fulfills. Application and memory bloat are pervasive in the IT industry. Each individual organization should look at their requirements whether they are hardware, software or both. Having to dedicate a machine to an application, especially a relatively simple application like Dspace, is wasteful for hardware resources and people resources. Web applications should _not_ need 2G of memory to run comfortably. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] DSpace a memory hog?
Thank you all for giving your opinion! Technically, is it the web application or the indexer that requires most of the memory? What data is kept in memory all the time (even when nobody is searching)? Is the memory usage proportional to the number of concurrent sessions? Thanks again, Pan On 4/18/07, Cory Snavely [EMAIL PROTECTED] wrote: Well, as I said at first, it all depends on your definition of what a memory hog is. Today's hog fits in tomorrow's pocket. We better all already be used to that. Also, I don't think for a *minute* that the original developers of DSpace made a casual choice about their development environment--in fact, I think they made a responsible choice given the alternatives. Let's give our colleagues credit that's due. Their choice permits scaling and fits well for an open-source project. Putting the general problem of memory bloat in their laps seems pretty angsty to me. Lastly, dedicating a server to DSpace is a choice, not a necessity. We as implementors have complete freedom to separate out the database and storage tiers, and mechanisms exist for scaling Tomcat horizontally as well. In the other direction, I suspect people are running DSpace on VMware or xen virtual machines, too. Cory Snavely University of Michigan Library IT Core Services On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote: Pan, Dspace is a memory hog considering the functionality the application provides. This is mainly due to the technological choices made by the founders of the Dspace project, and not the functional requirements the Dspace project fulfills. Application and memory bloat are pervasive in the IT industry. Each individual organization should look at their requirements whether they are hardware, software or both. Having to dedicate a machine to an application, especially a relatively simple application like Dspace, is wasteful for hardware resources and people resources. Web applications should _not_ need 2G of memory to run comfortably. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech