Hi Tom, Thanks again for your answers - apologies for following these up with more questions...
>>> with 16GB of memory and fast local storage >>> Java memory: -Xmx2048M -Xms2048M >> Is there a reason why you only allocate 1/8th of the system memory to the >> application? Have you found that adding extra doesn't help? > > In our experience, it merely delays when the error occurs, and we'd still > need to restart. Whether we do this nightly or every other night doesn't make > much difference. I'm not sure it would actually make it go faster. > Additionally, we need to keep memory free for file caching and thumbnail > generation; we found that if we assign too much memory to Java then the > system needs to read from disk more for these other tasks and we get a > slow-down there. Is this a linear relationship between memory and time, or does it start to flatten out over time? >>> - Assetstore: random structure causes large overhead on filesystem for no >>> real gain >> Are you able to expand on the overhead that is caused, and from your >> profiling, explain how the structure could be improved? My gut (and >> uniformed) instinct would be that since asset store reads are completely >> random depending on the items being viewed at the time, the layout of >> directories would be irrelevant. Writes may be slightly less efficient, but >> since writes only tend to occur once, they are of less consequence. > > Apologies for sounding cryptic; I was trying not to be too verbose in the > template. :-) > > This has mostly to do with back-ups. With about 600,000 files in random > directories, it can be hard to find out what files have changed. We > implemented an simple asset store structure that stores files by > year/month/day. This means we can mirror new files very quickly, and only > traverse the entire assetstore every other day to check if files have changed. > > Maybe I should expand a bit on our storage set-up: > > - our live system has about 90TB capacity, with an EMC SAN connected to a > pair of Sun servers. These present them to our private network at about > 4Gbps, as well as running the checksums (I wrote some Perl to do this job > locally, rather than add to the I/O of the live server.) > > - we have two sets of back-up servers (ZFS-based) off-site for the live > system, which use rsync to mirror all this data. (Two systems because > otherwise, if we lose one, it'd be vulnerable too long while the data is > re-sync'ed). > > A small script makes copies of the day's assetstore every hour; a complete > rsync runs across assetstores (the original one as well as the new one with > our own datestamp format) every alternating day, and at week-ends we run > rsync with checksums. Essentially this system is copy-on-write: if a file > changes on disk, the old back-up copy is moved into a holding area to be > deleted when necessary, and the new file copied in its place. My initial concern with that setup would be the use rsync over such a large amount of storage: rsync is horrendous for processor consumption, and you have a lot of disk for rsync to chew through in order to detect changed files. Is there a reason you don't use the in-built ZFS replication facility? This will presumably be much more efficient as the filesystem itself implicitly knows when to perform replication, and will be quicker and more up-to-date than hourly syncs. Cheers, Stuart Lewis IT Innovations Analyst and Developer Te Tumu Herenga The University of Auckland Library Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand Ph: +64 (0)9 373 7599 x81928 ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

