On 08/10/2010 11:13, Tom De Mulder wrote: > On 7 Oct 2010, at 21:56, Stuart Lewis wrote: > >>> with 16GB of memory and fast local storage >>> Java memory: -Xmx2048M -Xms2048M >> Is there a reason why you only allocate 1/8th of the system memory to the >> application? Have you found that adding extra doesn't help? > In our experience, it merely delays when the error occurs, and we'd still > need to restart. Whether we do this nightly or every other night doesn't make > much difference. I'm not sure it would actually make it go faster. > Additionally, we need to keep memory free for file caching and thumbnail > generation; we found that if we assign too much memory to Java then the > system needs to read from disk more for these other tasks and we get a > slow-down there. > >>> - Assetstore: random structure causes large overhead on filesystem for no >>> real gain >> Are you able to expand on the overhead that is caused, and from your >> profiling, explain how the structure could be improved? My gut (and >> uniformed) instinct would be that since asset store reads are completely >> random depending on the items being viewed at the time, the layout of >> directories would be irrelevant. Writes may be slightly less efficient, but >> since writes only tend to occur once, they are of less consequence. > Apologies for sounding cryptic; I was trying not to be too verbose in the > template. :-) > > This has mostly to do with back-ups. With about 600,000 files in random > directories, it can be hard to find out what files have changed. We > implemented an simple asset store structure that stores files by > year/month/day. This means we can mirror new files very quickly, and only > traverse the entire assetstore every other day to check if files have changed.
See: http://hdl.handle.net/10019.1/3161 How strange, I also proposed such a thing !! > Maybe I should expand a bit on our storage set-up: > > - our live system has about 90TB capacity, with an EMC SAN connected to a > pair of Sun servers. These present them to our private network at about > 4Gbps, as well as running the checksums (I wrote some Perl to do this job > locally, rather than add to the I/O of the live server.) > > - we have two sets of back-up servers (ZFS-based) off-site for the live > system, which use rsync to mirror all this data. (Two systems because > otherwise, if we lose one, it'd be vulnerable too long while the data is > re-sync'ed). > > A small script makes copies of the day's assetstore every hour; a complete > rsync runs across assetstores (the original one as well as the new one with > our own datestamp format) every alternating day, and at week-ends we run > rsync with checksums. Essentially this system is copy-on-write: if a file > changes on disk, the old back-up copy is moved into a holding area to be > deleted when necessary, and the new file copied in its place. > > Finally, the date structure for the directory/file names helps locate problem > files quickly if necessary. Not a huge thing, but it makes my life easier. > >>> - Search indexer: fails on large repositories, slowing down and eventually >>> running out of memory. >> Do you have any percentages on the amount of page views that relate to >> browse, and how many relate to other views? I'm curious if browse from the >> front end is causing an issue too? The reason I'm asking, is that with the >> potential inclusion of the dspace-discovery layer in a future version, this >> could replace the database-driven browse system with solr. Not only will >> this provide a richer faceted search, but it could likely offer a good >> performance boost for browse-related functions. It also offers another way >> of scaling-out, by putting solr on a different server. > This question I'll have to leave to Simon to answer, so I don't make a hash > of it. > > > Best, > > -- > Tom De Mulder<td...@cam.ac.uk> - Cambridge University Computing Service > +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH > > > ------------------------------------------------------------------------------ > Beautiful is writing same markup. Internet Explorer 9 supports > standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2& L3. > Spend less time writing and rewriting code and more time creating great > experiences on the web. Be a part of the beta today. > http://p.sf.net/sfu/beautyoftheweb > _______________________________________________ > DSpace-tech mailing list > DSpace-tech@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Hilton Gibson Systems Administrator JS Gericke Library Room 1053 Stellenbosch University Private Bag X5036 Stellenbosch 7599 South Africa Tel: +27 21 808 4100 | Cell: +27 84 646 4758 ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech