On Mon, 18 Dec 2006, Robert Tansley wrote: > First, I specified the number 200,000 items because that's the order > of size of the biggest known DSpace instance, Cambridge's > (www.dspace.cam.ac.uk). That's not a magic size, I'm sure more is > possible. We are lacking in concrete performance data for DSpace > (volunteers?) which has led to some speculative bad press.
When we did a recent big code update (to 1.4 with some minor tweaks), we disabled the dreadful object cache (by commenting out the two routines that implement it, it was only a quick hack). This fixed our main problem, ie. that DSpace would run out of memory because the object cache would just grow. It also gave us a noticeable performance boost (tested with siege), probably because the garbage collector doesn't have to kick in nearly as often. > That said, I think no. of items is the significant factor, because the > main culprits in slowdown are a) the browse code (for which only items > are relevant) as opposed to the Lucene search engine which deals with > the full-text indexing, and which scales far more handsomely; and b) > the in-memory object cache growing out of control during > import/re-indexing, which as of 1.4 should be able to use constant > memory regardless of repo size. (May need a couple of code tweaks to > fix this -- ping the list tomorrow and I'll check). I wish you'd either get rid of the object cache or use an open source cache implementation. However, given the nature of DSpace (and the fact that most of the time you won't get the same item being accessed quickly in succession) I don't think it needs an object cache. And, as I just said, disabling it makes it *faster*. Currently, I see no problems with the pure speed of the web application. However, both the importer and indexer still get slower and slower over time with big imports (although to solve that we run the indexer in batches of a few hundred items at a time, then flush the index-this doesn't get rid of the slowdown but it does lessen it; it also means the indexer doesn't run out of memory). Another issue is backups - when you have as many files as we do, it gets hard to find out what's changed in the assetstore when making backups (we use rsync so we can backup only the changes - copying the entire assetstore across each time would be too much of a hit, even on our dedicated network link to our offsite backup servers). This is just a quick braindump, because I happened to see Rob's post scroll past, and by no means exhaustive, but I think it covers our current main performance-related issues, such as they exist. My current concerns lie far more with the authentication/authorization system... For reference, our webapp runs on a dual, dual-core CPU machine with 8GB of memory, with the database on a separate (similar, but with very fast disk) machine. The assetstores sit on a 4Gb (fibrechannel) SAN. Regards, -- Tom De Mulder <[EMAIL PROTECTED]> - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -> 17/01/2007 : The Moon is Waning Crescent (14% of Full) ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

