On 08/10/2010 11:13, Tom De Mulder wrote:
> On 7 Oct 2010, at 21:56, Stuart Lewis wrote:
>
>>>      with 16GB of memory and fast local storage
>>>      Java memory: -Xmx2048M -Xms2048M
>> Is there a reason why you only allocate 1/8th of the system memory to the 
>> application?  Have you found that adding extra doesn't help?
> In our experience, it merely delays when the error occurs, and we'd still 
> need to restart. Whether we do this nightly or every other night doesn't make 
> much difference. I'm not sure it would actually make it go faster. 
> Additionally, we need to keep memory free for file caching and thumbnail 
> generation; we found that if we assign too much memory to Java then the 
> system needs to read from disk more for these other tasks and we get a 
> slow-down there.
>
>>> - Assetstore: random structure causes large overhead on filesystem for no 
>>> real gain
>> Are you able to expand on the overhead that is caused, and from your 
>> profiling, explain how the structure could be improved?  My gut (and 
>> uniformed) instinct would be that since asset store reads are completely 
>> random depending on the items being viewed at the time, the layout of 
>> directories would be irrelevant.  Writes may be slightly less efficient, but 
>> since writes only tend to occur once, they are of less consequence.
> Apologies for sounding cryptic; I was trying not to be too verbose in the 
> template. :-)
>
> This has mostly to do with back-ups. With about 600,000 files in random 
> directories, it can be hard to find out what files have changed. We 
> implemented an simple asset store structure that stores files by 
> year/month/day. This means we can mirror new files very quickly, and only 
> traverse the entire assetstore every other day to check if files have changed.

See: http://hdl.handle.net/10019.1/3161
How strange, I also proposed such a thing !!

> Maybe I should expand a bit on our storage set-up:
>
> - our live system has about 90TB capacity, with an EMC SAN connected to a 
> pair of Sun servers. These present them to our private network at about 
> 4Gbps, as well as running the checksums (I wrote some Perl to do this job 
> locally, rather than add to the I/O of the live server.)
>
> - we have two sets of back-up servers (ZFS-based) off-site for the live 
> system, which use rsync to mirror all this data. (Two systems because 
> otherwise, if we lose one, it'd be vulnerable too long while the data is 
> re-sync'ed).
>
> A small script makes copies of the day's assetstore every hour; a complete 
> rsync runs across assetstores (the original one as well as the new one with 
> our own datestamp format) every alternating day, and at week-ends we run 
> rsync with checksums. Essentially this system is copy-on-write: if a file 
> changes on disk, the old back-up copy is moved into a holding area to be 
> deleted when necessary, and the new file copied in its place.
>
> Finally, the date structure for the directory/file names helps locate problem 
> files quickly if necessary. Not a huge thing, but it makes my life easier.
>
>>> - Search indexer: fails on large repositories, slowing down and eventually 
>>> running out of memory.
>> Do you have any percentages on the amount of page views that relate to 
>> browse, and how many relate to other views?  I'm curious if browse from the 
>> front end is causing an issue too?  The reason I'm asking, is that with the 
>> potential inclusion of the dspace-discovery layer in a future version, this 
>> could replace the database-driven browse system with solr.  Not only will 
>> this provide a richer faceted search, but it could likely offer a good 
>> performance boost for browse-related functions.  It also offers another way 
>> of scaling-out, by putting solr on a different server.
> This question I'll have to leave to Simon to answer, so I don't make a hash 
> of it.
>
>
> Best,
>
> --
> Tom De Mulder<td...@cam.ac.uk>  - Cambridge University Computing Service
> +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
>
>
> ------------------------------------------------------------------------------
> Beautiful is writing same markup. Internet Explorer 9 supports
> standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2&  L3.
> Spend less time writing and  rewriting code and more time creating great
> experiences on the web. Be a part of the beta today.
> http://p.sf.net/sfu/beautyoftheweb
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech

-- 
Hilton Gibson
Systems Administrator
JS Gericke Library
Room 1053
Stellenbosch University
Private Bag X5036
Stellenbosch
7599
South Africa

Tel: +27 21 808 4100 | Cell: +27 84 646 4758


------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to