Hi Hilton, >>>> - Assetstore: random structure causes large overhead on filesystem for no >>>> real gain >>> Are you able to expand on the overhead that is caused, and from your >>> profiling, explain how the structure could be improved? My gut (and >>> uniformed) instinct would be that since asset store reads are completely >>> random depending on the items being viewed at the time, the layout of >>> directories would be irrelevant. Writes may be slightly less efficient, >>> but since writes only tend to occur once, they are of less consequence. >> Apologies for sounding cryptic; I was trying not to be too verbose in the >> template. :-) >> >> This has mostly to do with back-ups. With about 600,000 files in random >> directories, it can be hard to find out what files have changed. We >> implemented an simple asset store structure that stores files by >> year/month/day. This means we can mirror new files very quickly, and only >> traverse the entire assetstore every other day to check if files have >> changed. > > See: http://hdl.handle.net/10019.1/3161 > How strange, I also proposed such a thing !!
I've just read this paper and have a question. You state the following: ---- At the moment, December 2009, the following two are the most widely used software packages for building and maintaining institutional repositories according the opendoar website. • http://www.dspace.org with 502 installations. • http://www.eprints.org with 261 installations. The digital objects and store are located as follows for the above: • DSpace => $DSPACE_HOME/assetstore • EPrints => $EPRINTS_HOME/disk0 None of the above use a time/date based file system for storing digital objects. None of them use UUID's to create unique digital objects and stores. In one hundred years time how can any of the above satisfy a future researcher that the digital object is unique and has remained persistently so during the years to 2109. ---- Are you able to expand for us your reasoning that repositories that do not use datestamped directories and filenames containing UUIDs will not satisfy future researchers? Just because a file is stored in that location with a UUID makes it no more or less likely that it has remained unique and persistent. Filenames alone cannot guarantee this - it is up the repository to manage the integrity of the stored items, and the wider system to ensure that this is the case. This is where the notion of a 'trusted repository' comes into play - the fact the the repository pltform and the system as a whole is trusted to have maintained the integrity of the contents. [A side note: You'll find a lot of the work that Tim has been leading recently regarding AIPs is of interest in this area. https://wiki.duraspace.org/display/DSPACE/AipBackupRestore ] Cheers, Stuart Lewis IT Innovations Analyst and Developer Te Tumu Herenga The University of Auckland Library Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand Ph: +64 (0)9 373 7599 x81928 ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

