This is good news for us -- the batch importing and indexing
are cron jobs kicked off at mid-night and at least the domestic
users won't feel them.
-Pan
On 4/19/07, Robert Tansley <[EMAIL PROTECTED]> wrote:
Hi Pan,
The Web server aspect (i.e. Tomcat) should have fairly constant memory
use -- the vast majority of operations are very short and work on a
very small number of objects, and as soon as the request is over any
memory used is returned to the heap. How much memory you need to give
it largely depends on the load, i.e. how many of these the server will
be servicing at a given instant.
The areas I think folks have run into memory use issues are batch
importing, indexing and the media filters (thumbnail generation, text
extraction for indexing) -- these operate on a large number of objects
at once, and some of the DSpace code isn't so great at freeing up
objects in these operations. But we're finding the problems and
fixing them as Cory mentions.
Getting technical below: Developers: a quick scan of the code shows that:
batch export (classic): needs fixing
batch import (classic): needs fixing
browse indexer: needs fixing
search (lucene indexer): needs fixing
media filter: OK
history system: problems recording collection state (loads all items
into memory)
Sitemap generator: OK
checksum checker: fine but only because it has its own DB access
routines and doesn't use the APIs (!)
The new-style packager (with plug-ins) only appears to be able to
operate on one Item at a time.
Also found: BitstreamStorageManager appears to reach up into busines
logic layer and user checker API (!!!!) this needs fixing. This is
probably because the checksum checker includes its own DB access API
:-O
The above could probably be fixed for 1.4.2, with the potential
exception of the checksum checker which needs to be changed to use the
correct APIs.
Rob
On 18/04/07, Pan Family <[EMAIL PROTECTED]> wrote:
> Thank you all for giving your opinion!
>
> Technically, is it the web application or the indexer that requires
> most of the memory? What data is kept in memory all the time
> (even when nobody is searching)? Is the memory usage proportional
> to the number of concurrent sessions?
>
> Thanks again,
>
> Pan
>
>
>
>
>
> On 4/18/07, Cory Snavely <[EMAIL PROTECTED]> wrote:
> > Well, as I said at first, it all depends on your definition of what a
> > memory hog is. Today's hog fits in tomorrow's pocket. We better all
> > already be used to that.
> >
> > Also, I don't think for a *minute* that the original developers of
> > DSpace made a casual choice about their development environment--in
> > fact, I think they made a responsible choice given the alternatives.
> > Let's give our colleagues credit that's due. Their choice permits
> > scaling and fits well for an open-source project. Putting the general
> > problem of memory bloat in their laps seems pretty angsty to me.
> >
> > Lastly, dedicating a server to DSpace is a choice, not a necessity. We
> > as implementors have complete freedom to separate out the database and
> > storage tiers, and mechanisms exist for scaling Tomcat horizontally as
> > well. In the other direction, I suspect people are running DSpace on
> > VMware or xen virtual machines, too.
> >
> > Cory Snavely
> > University of Michigan Library IT Core Services
> >
> > On Wed, 2007-04-18 at 13:40 -0500, Brad Teale wrote:
> > > Pan,
> > >
> > > Dspace is a memory hog considering the functionality the application
> > > provides. This is mainly due to the technological choices made by
the
> > > founders of the Dspace project, and not the functional requirements
the
> > > Dspace project fulfills.
> > >
> > > Application and memory bloat are pervasive in the IT industry. Each
> > > individual organization should look at their requirements whether
they
> > > are hardware, software or both. Having to dedicate a machine to an
> > > application, especially a relatively simple application like Dspace,
is
> > > wasteful for hardware resources and people resources.
> > >
> > > Web applications should _not_ need 2G of memory to "run
comfortably".
> > >
> >
> >
>
>
>
-------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech