Re: [Dspace-tech] DSpace a memory hog?

2007-04-20 Thread James Rutherford
Hi Rob,

On Thu, Apr 19, 2007 at 08:27:32PM -0400, Robert Tansley wrote:
 batch export (classic): needs fixing
 batch import (classic): needs fixing
 browse indexer: needs fixing
 search (lucene indexer): needs fixing
 media filter: OK
 history system:  problems recording collection state (loads all items
 into memory)
 Sitemap generator: OK
 checksum checker: fine but only because it has its own DB access
 routines and doesn't use the APIs (!)
 
 The new-style packager (with plug-ins) only appears to be able to
 operate on one Item at a time.
 
 The above could probably be fixed for 1.4.2, with the potential
 exception of the checksum checker which needs to be changed to use the
 correct APIs.

I think these are a bit late for 1.4.2. I was hoping to get a beta out
on Monday, and make the full release a week after. If we have something
solid to aim for though, I don't see why we can't start work on 1.4.3
immediately (we can just keep committing to the branch after all).

Jim

-- 
James Rutherford  |  Hewlett-Packard Limited registered Office:
Research Engineer |  Cain Road,
HP Labs   |  Bracknell,
Bristol, UK   |  Berks
+44 117 312 7066  |  RG12 1HN.
[EMAIL PROTECTED]   |  Registered No: 690597 England

The contents of this message and any attachments to it are confidential and
may be legally privileged. If you have received this message in error, you
should delete it from your system immediately and advise the sender. To any
recipient of this message within HP, unless otherwise stated you should
consider this message and attachments as HP CONFIDENTIAL.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace a memory hog?

2007-04-20 Thread Brad Teale
Cory,

Comments below:

On 04/18/2007 01:54 PM, Cory Snavely wrote:
 Well, as I said at first, it all depends on your definition of what a
 memory hog is. Today's hog fits in tomorrow's pocket. We better all
 already be used to that.

Thank you for proving my point on memory bloat pervasiveness in the IT
industry.  This type of thinking allows vendors (whether open source or
proprietary) to drive up the base systems requirements without greatly
improving functionality because it is predestined.

 Also, I don't think for a *minute* that the original developers of
 DSpace made a casual choice about their development environment--in
 fact, I think they made a responsible choice given the alternatives.
 Let's give our colleagues credit that's due. Their choice permits
 scaling and fits well for an open-source project. Putting the general
 problem of memory bloat in their laps seems pretty angsty to me.
 
 Lastly, dedicating a server to DSpace is a choice, not a necessity. We
 as implementors have complete freedom to separate out the database and
 storage tiers, and mechanisms exist for scaling Tomcat horizontally as
 well. In the other direction, I suspect people are running DSpace on
 VMware or xen virtual machines, too.

I didn't say they made a casual choice about their development
environment.  I said the functional requirements of the application
didn't justify the memory footprint required to run this application.
Whether or not they made a choice that fits well for an open-source
project depends on your definition of Open Source.  However, I don't
think that debate is relevant to this discussion.

As far as scaling requirements, it depends on where you want
scalability.  As you pointed out, there is a natural ability with web
applications to scale them vertically through hardware or Tomcat's, now
native, horizontal approach.  Since either approach needs hardware, the
memory footprint of an application needs to be taken into account.  The
higher the base system requirements, the likelihood of someone having
a scalable system is lowered due to total cost of ownership (TCO).
While virtual machine technology can help lower some TCO issues, it
brings in a whole new batch of problems which are out of scope for this
discussion.

The general problem of memory bloat rests in all developers laps (mine
included).  As an industry, we need to constantly weigh our use of
memory against the functionality we are providing.  The functionality
provided by Dspace isn't rocket science, and shouldn't require memory
footprints greater than most of systems that get people into space.

-- 
Brad TealeWeb Application Developer
Digital Library Development Lab   University of Minnesota Libraries
[EMAIL PROTECTED]


 On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote:
 Pan,

 Dspace is a memory hog considering the functionality the application
 provides.  This is mainly due to the technological choices made by the
 founders of the Dspace project, and not the functional requirements the
 Dspace project fulfills.

 Application and memory bloat are pervasive in the IT industry.  Each
 individual organization should look at their requirements whether they
 are hardware, software or both.  Having to dedicate a machine to an
 application, especially a relatively simple application like Dspace, is
 wasteful for hardware resources and people resources.

 Web applications should _not_ need 2G of memory to run comfortably.




-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace a memory hog?

2007-04-19 Thread Cory Snavely
Generally what's going on is that Tomcat, the web application framework,
has a large virtual machine running with a substantial amount of memory
allocated to the caching of programs and data for performance.

Depending on your database configuration, there can also be a
substantial amount of allocation to cache in Postgres too.

The indexer is a periodic process that does not run constantly. You
still must account for the amount of memory it consumes while running.
Memory requirements for recent versions of the indexing routine are of
constant order, meaning they do not vary appreciably with repository
size.

On Wed, 2007-04-18 at 18:09 -0700, Pan Family wrote:
 Thank you all for giving your opinion!
 
 Technically, is it the web application or the indexer that requires 
 most of the memory?  What data is kept in memory all the time
 (even when nobody is searching)?  Is the memory usage proportional
 to the number of concurrent sessions?
 
 Thanks again,
 
 Pan
 
 
 
 
 On 4/18/07, Cory Snavely [EMAIL PROTECTED] wrote:
 Well, as I said at first, it all depends on your definition of
 what a
 memory hog is. Today's hog fits in tomorrow's pocket. We
 better all
 already be used to that.
 
 Also, I don't think for a *minute* that the original
 developers of 
 DSpace made a casual choice about their development
 environment--in
 fact, I think they made a responsible choice given the
 alternatives.
 Let's give our colleagues credit that's due. Their choice
 permits
 scaling and fits well for an open-source project. Putting the
 general
 problem of memory bloat in their laps seems pretty angsty to
 me.
 
 Lastly, dedicating a server to DSpace is a choice, not a
 necessity. We
 as implementors have complete freedom to separate out the
 database and 
 storage tiers, and mechanisms exist for scaling Tomcat
 horizontally as
 well. In the other direction, I suspect people are running
 DSpace on
 VMware or xen virtual machines, too.
 
 Cory Snavely
 University of Michigan Library IT Core Services 
 
 On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote:
  Pan,
 
  Dspace is a memory hog considering the functionality the
 application
  provides.  This is mainly due to the technological choices
 made by the 
  founders of the Dspace project, and not the functional
 requirements the
  Dspace project fulfills.
 
  Application and memory bloat are pervasive in the IT
 industry.  Each
  individual organization should look at their requirements
 whether they 
  are hardware, software or both.  Having to dedicate a
 machine to an
  application, especially a relatively simple application like
 Dspace, is
  wasteful for hardware resources and people resources.
 
  Web applications should _not_ need 2G of memory to run
 comfortably.
 
 
 
 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___ DSpace-tech mailing list 
 DSpace-tech@lists.sourceforge.net 
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace a memory hog?

2007-04-19 Thread Robert Tansley
Hi Pan,

The Web server aspect (i.e. Tomcat) should have fairly constant memory
use -- the vast majority of operations are very short and work on a
very small number of objects, and as soon as the request is over any
memory used is returned to the heap.  How much memory you need to give
it largely depends on the load, i.e. how many of these the server will
be servicing at a given instant.

The areas I think folks have run into memory use issues are batch
importing, indexing and the media filters (thumbnail generation, text
extraction for indexing) -- these operate on a large number of objects
at once, and some of the DSpace code isn't so great at freeing up
objects in these operations.  But we're finding the problems and
fixing them as Cory mentions.

Getting technical below:  Developers: a quick scan of the code shows that:

batch export (classic): needs fixing
batch import (classic): needs fixing
browse indexer: needs fixing
search (lucene indexer): needs fixing
media filter: OK
history system:  problems recording collection state (loads all items
into memory)
Sitemap generator: OK
checksum checker: fine but only because it has its own DB access
routines and doesn't use the APIs (!)

The new-style packager (with plug-ins) only appears to be able to
operate on one Item at a time.

Also found: BitstreamStorageManager appears to reach up into busines
logic layer and user checker API () this needs fixing.  This is
probably because the checksum checker includes its own DB access API
:-O

The above could probably be fixed for 1.4.2, with the potential
exception of the checksum checker which needs to be changed to use the
correct APIs.

Rob

On 18/04/07, Pan Family [EMAIL PROTECTED] wrote:
 Thank you all for giving your opinion!

 Technically, is it the web application or the indexer that requires
 most of the memory?  What data is kept in memory all the time
 (even when nobody is searching)?  Is the memory usage proportional
 to the number of concurrent sessions?

 Thanks again,

 Pan





  On 4/18/07, Cory Snavely [EMAIL PROTECTED] wrote:
  Well, as I said at first, it all depends on your definition of what a
  memory hog is. Today's hog fits in tomorrow's pocket. We better all
  already be used to that.
 
  Also, I don't think for a *minute* that the original developers of
  DSpace made a casual choice about their development environment--in
  fact, I think they made a responsible choice given the alternatives.
  Let's give our colleagues credit that's due. Their choice permits
  scaling and fits well for an open-source project. Putting the general
  problem of memory bloat in their laps seems pretty angsty to me.
 
  Lastly, dedicating a server to DSpace is a choice, not a necessity. We
  as implementors have complete freedom to separate out the database and
  storage tiers, and mechanisms exist for scaling Tomcat horizontally as
  well. In the other direction, I suspect people are running DSpace on
  VMware or xen virtual machines, too.
 
  Cory Snavely
  University of Michigan Library IT Core Services
 
  On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote:
   Pan,
  
   Dspace is a memory hog considering the functionality the application
   provides.  This is mainly due to the technological choices made by the
   founders of the Dspace project, and not the functional requirements the
   Dspace project fulfills.
  
   Application and memory bloat are pervasive in the IT industry.  Each
   individual organization should look at their requirements whether they
   are hardware, software or both.  Having to dedicate a machine to an
   application, especially a relatively simple application like Dspace, is
   wasteful for hardware resources and people resources.
  
   Web applications should _not_ need 2G of memory to run comfortably.
  
 
 


 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] DSpace a memory hog?

2007-04-18 Thread Pan Family

Hi,

There is a rumor that says DSpace is a memory hog.
I don't know where this is from but it may not be that
important.  What is important is that it makes my
management nerves.  So I'd like to hear from those
who know anything about this issue.  Is it really
a memory hog?  Under what circumstances it
might become a memory hog?  Or there should
be no worry about memory usage at all?

Thanks a lot in advance!

-Pan
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace a memory hog?

2007-04-18 Thread Cory Snavely
This depends on your definition of a memory hog.

We run a relatively large instance of DSpace and we allocate 512MB to
Tomcat, about 100MB to Postgres, and 256MB for daily indexing runs (via
the dsrun script).

In earlier versions of DSpace the indexing routine needed to be patched
to work around a poor implementation that caused memory allocation to be
linear with repository size. Without that, we were running out of memory
during indexing. I believe that patch is now part of the base.

We run comfortably inside 2G of physical memory. I may have considered
that a memory hog 5 years ago, but today I consider it light.

Cory Snavely
University of Michigan Library IT Core Services

On Wed, 2007-04-18 at 01:01 -0700, Pan Family wrote:
 Hi,
 
 There is a rumor that says DSpace is a memory hog.
 I don't know where this is from but it may not be that
 important.  What is important is that it makes my
 management nerves.  So I'd like to hear from those
 who know anything about this issue.  Is it really
 a memory hog?  Under what circumstances it
 might become a memory hog?  Or there should
 be no worry about memory usage at all?
 
 Thanks a lot in advance!
 
 -Pan 
 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___ DSpace-tech mailing list 
 DSpace-tech@lists.sourceforge.net 
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace a memory hog?

2007-04-18 Thread mnora
Pan

In comparison to applications which run on inferior Operating Systems such
as Microsoft (2000 - vista) this is not abnormal memory usage. Generally I
dedicate a server to dspace or any of the archival software which I use.
Such rumors are either started by people with insufficient technical
knowledge or the purveyors of proprietary software who are trying to scare
people away from adopting software which they cannot extort exhorbitatnt
fees from.

Gerry

Gerry Arthus
Systems Administrator:
Long Island Library Resources Council
SUNY at Stony Brook
Stony Brook, New York
US  11794-3399
Phone: 1-631-632-6652
FAX: 631-632-6662
Home: 631-289-7565
Email: [EMAIL PROTECTED]
Professor: Departments of: Graduate Computer Engineering, Earth and
Environmental Science, and Engineering Management
C.W. Post Campus of Long Island University
720 Northern Boulevard
Brookville, New York US 11548-1300
Phone: 516-299-2293




 Hi,

 There is a rumor that says DSpace is a memory hog.
 I don't know where this is from but it may not be that
 important.  What is important is that it makes my
 management nerves.  So I'd like to hear from those
 who know anything about this issue.  Is it really
 a memory hog?  Under what circumstances it
 might become a memory hog?  Or there should
 be no worry about memory usage at all?

 Thanks a lot in advance!

 -Pan
 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech





-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace a memory hog?

2007-04-18 Thread Brad Teale
Pan,

Dspace is a memory hog considering the functionality the application
provides.  This is mainly due to the technological choices made by the
founders of the Dspace project, and not the functional requirements the
Dspace project fulfills.

Application and memory bloat are pervasive in the IT industry.  Each
individual organization should look at their requirements whether they
are hardware, software or both.  Having to dedicate a machine to an
application, especially a relatively simple application like Dspace, is
wasteful for hardware resources and people resources.

Web applications should _not_ need 2G of memory to run comfortably.

-- 
Brad TealeWeb Application Developer
Digital Library Development Lab   University of Minnesota Libraries
[EMAIL PROTECTED]

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace a memory hog?

2007-04-18 Thread Cory Snavely
Well, as I said at first, it all depends on your definition of what a
memory hog is. Today's hog fits in tomorrow's pocket. We better all
already be used to that.

Also, I don't think for a *minute* that the original developers of
DSpace made a casual choice about their development environment--in
fact, I think they made a responsible choice given the alternatives.
Let's give our colleagues credit that's due. Their choice permits
scaling and fits well for an open-source project. Putting the general
problem of memory bloat in their laps seems pretty angsty to me.

Lastly, dedicating a server to DSpace is a choice, not a necessity. We
as implementors have complete freedom to separate out the database and
storage tiers, and mechanisms exist for scaling Tomcat horizontally as
well. In the other direction, I suspect people are running DSpace on
VMware or xen virtual machines, too.

Cory Snavely
University of Michigan Library IT Core Services

On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote:
 Pan,
 
 Dspace is a memory hog considering the functionality the application
 provides.  This is mainly due to the technological choices made by the
 founders of the Dspace project, and not the functional requirements the
 Dspace project fulfills.
 
 Application and memory bloat are pervasive in the IT industry.  Each
 individual organization should look at their requirements whether they
 are hardware, software or both.  Having to dedicate a machine to an
 application, especially a relatively simple application like Dspace, is
 wasteful for hardware resources and people resources.
 
 Web applications should _not_ need 2G of memory to run comfortably.
 


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] DSpace a memory hog?

2007-04-18 Thread Pan Family

Thank you all for giving your opinion!

Technically, is it the web application or the indexer that requires
most of the memory?  What data is kept in memory all the time
(even when nobody is searching)?  Is the memory usage proportional
to the number of concurrent sessions?

Thanks again,

Pan




On 4/18/07, Cory Snavely [EMAIL PROTECTED] wrote:


Well, as I said at first, it all depends on your definition of what a
memory hog is. Today's hog fits in tomorrow's pocket. We better all
already be used to that.

Also, I don't think for a *minute* that the original developers of
DSpace made a casual choice about their development environment--in
fact, I think they made a responsible choice given the alternatives.
Let's give our colleagues credit that's due. Their choice permits
scaling and fits well for an open-source project. Putting the general
problem of memory bloat in their laps seems pretty angsty to me.

Lastly, dedicating a server to DSpace is a choice, not a necessity. We
as implementors have complete freedom to separate out the database and
storage tiers, and mechanisms exist for scaling Tomcat horizontally as
well. In the other direction, I suspect people are running DSpace on
VMware or xen virtual machines, too.

Cory Snavely
University of Michigan Library IT Core Services

On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote:
 Pan,

 Dspace is a memory hog considering the functionality the application
 provides.  This is mainly due to the technological choices made by the
 founders of the Dspace project, and not the functional requirements the
 Dspace project fulfills.

 Application and memory bloat are pervasive in the IT industry.  Each
 individual organization should look at their requirements whether they
 are hardware, software or both.  Having to dedicate a machine to an
 application, especially a relatively simple application like Dspace, is
 wasteful for hardware resources and people resources.

 Web applications should _not_ need 2G of memory to run comfortably.



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech