On 7 Oct 2010, at 21:56, Stuart Lewis wrote: >> with 16GB of memory and fast local storage >> Java memory: -Xmx2048M -Xms2048M > Is there a reason why you only allocate 1/8th of the system memory to the > application? Have you found that adding extra doesn't help?
In our experience, it merely delays when the error occurs, and we'd still need to restart. Whether we do this nightly or every other night doesn't make much difference. I'm not sure it would actually make it go faster. Additionally, we need to keep memory free for file caching and thumbnail generation; we found that if we assign too much memory to Java then the system needs to read from disk more for these other tasks and we get a slow-down there. >> - Assetstore: random structure causes large overhead on filesystem for no >> real gain > Are you able to expand on the overhead that is caused, and from your > profiling, explain how the structure could be improved? My gut (and > uniformed) instinct would be that since asset store reads are completely > random depending on the items being viewed at the time, the layout of > directories would be irrelevant. Writes may be slightly less efficient, but > since writes only tend to occur once, they are of less consequence. Apologies for sounding cryptic; I was trying not to be too verbose in the template. :-) This has mostly to do with back-ups. With about 600,000 files in random directories, it can be hard to find out what files have changed. We implemented an simple asset store structure that stores files by year/month/day. This means we can mirror new files very quickly, and only traverse the entire assetstore every other day to check if files have changed. Maybe I should expand a bit on our storage set-up: - our live system has about 90TB capacity, with an EMC SAN connected to a pair of Sun servers. These present them to our private network at about 4Gbps, as well as running the checksums (I wrote some Perl to do this job locally, rather than add to the I/O of the live server.) - we have two sets of back-up servers (ZFS-based) off-site for the live system, which use rsync to mirror all this data. (Two systems because otherwise, if we lose one, it'd be vulnerable too long while the data is re-sync'ed). A small script makes copies of the day's assetstore every hour; a complete rsync runs across assetstores (the original one as well as the new one with our own datestamp format) every alternating day, and at week-ends we run rsync with checksums. Essentially this system is copy-on-write: if a file changes on disk, the old back-up copy is moved into a holding area to be deleted when necessary, and the new file copied in its place. Finally, the date structure for the directory/file names helps locate problem files quickly if necessary. Not a huge thing, but it makes my life easier. >> - Search indexer: fails on large repositories, slowing down and eventually >> running out of memory. > Do you have any percentages on the amount of page views that relate to > browse, and how many relate to other views? I'm curious if browse from the > front end is causing an issue too? The reason I'm asking, is that with the > potential inclusion of the dspace-discovery layer in a future version, this > could replace the database-driven browse system with solr. Not only will > this provide a richer faceted search, but it could likely offer a good > performance boost for browse-related functions. It also offers another way > of scaling-out, by putting solr on a different server. This question I'll have to leave to Simon to answer, so I don't make a hash of it. Best, -- Tom De Mulder <[email protected]> - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

