Hi Stuart

Also please see: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Digital_Signing

Cheers

hg

On 08/10/2010 20:31, Stuart Lewis wrote:
Hi Hilton,

- Assetstore: random structure causes large overhead on filesystem for no real 
gain
Are you able to expand on the overhead that is caused, and from your profiling, 
explain how the structure could be improved?  My gut (and uniformed) instinct 
would be that since asset store reads are completely random depending on the 
items being viewed at the time, the layout of directories would be irrelevant.  
Writes may be slightly less efficient, but since writes only tend to occur 
once, they are of less consequence.
Apologies for sounding cryptic; I was trying not to be too verbose in the 
template. :-)

This has mostly to do with back-ups. With about 600,000 files in random 
directories, it can be hard to find out what files have changed. We implemented 
an simple asset store structure that stores files by year/month/day. This means 
we can mirror new files very quickly, and only traverse the entire assetstore 
every other day to check if files have changed.
See: http://hdl.handle.net/10019.1/3161
How strange, I also proposed such a thing !!
I've just read this paper and have a question.  You state the following:

----
At the moment, December 2009, the following two are the most widely used 
software packages for building and maintaining institutional repositories 
according the opendoar website.

•       http://www.dspace.org with 502 installations.
•       http://www.eprints.org with 261 installations.

The digital objects and store are located as follows for the above:

•       DSpace =>  $DSPACE_HOME/assetstore
•       EPrints =>  $EPRINTS_HOME/disk0

None of the above use a time/date based file system for storing digital 
objects. None of them use UUID's to create unique digital
objects and stores.

In one hundred years time how can any of the above satisfy a future researcher 
that the digital object is unique and has remained persistently so during the 
years to 2109.
----

Are you able to expand for us your reasoning that repositories that do not use 
datestamped directories and filenames containing UUIDs will not satisfy future 
researchers?

Just because a file is stored in that location with a UUID makes it no more or 
less likely that it has remained unique and persistent.  Filenames alone cannot 
guarantee this - it is up the repository to manage the integrity of the stored 
items, and the wider system to ensure that this is the case. This is where the 
notion of a 'trusted repository' comes into play - the fact the the repository 
pltform and the system as a whole is trusted to have maintained the integrity 
of the contents.

[A side note: You'll find a lot of the work that Tim has been leading recently 
regarding AIPs is of interest in this area. 
https://wiki.duraspace.org/display/DSPACE/AipBackupRestore ]

Cheers,


Stuart Lewis
IT Innovations Analyst and Developer
Te Tumu Herenga The University of Auckland Library
Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand
Ph: +64 (0)9 373 7599 x81928


--
Hilton Gibson
Systems Administrator
JS Gericke Library
Room 1053
Stellenbosch University
Private Bag X5036
Stellenbosch
7599
South Africa

Tel: +27 21 808 4100 | Cell: +27 84 646 4758

"Simplicity is the ultimate sophistication"
        Leonardo da Vinci

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to