Hola Guys,

here are some comments about the Storage ideas from my point of view:


> So here's a provocative question to start: Assuming for a moment that
> the core Fedora object model (versioning warts and all) stays the same
> for 4.0, would something like this interface actually be compatible
> with the major objectives we've talked about with respect to High
> Level Storage?
I think the interface is fine, and just like Akubra it should be 
straightforward to implement on e.g. HDFS/HBase. But shouldn't there be 
a locking mechanism on the Digital Object level? So that if this were to 
be used in a distributed environment, that fedora instance a can lock 
object X while e.g. it's updating the object's datastreams.

It would be nice if the setContent() method could supply the size of the 
datastreams, so that implmentations could choose a storage layer based 
on the size of the stream, to mitigate e.g. Hadoop's "Small files 
problem", so you can choose to write small files in an Sequence file or 
even a HBase table. Maybe some kind of Hints as they are present in the 
Akubra API would make sense, so that arbitrary indformation can be 
passed down into the storage layer.

> c) Transactions.
> Unsure. But I think it's worth stepping back and considering the
> cost/benefit of implementing true ACID transactions across Fedora's
> API for Fedora 4. I know the discussion of HLStorage has touched on
> the possibility of doing this in the past, but it's been very short on
> detail.
IMHO transactions would be a feature that the users would like to see 
very much. Transactions seem to be a feature that invokes a feeling of 
trust in users.
And i recently played a bit with implementing a custom 
PlatformTransactionManager from Spring which gives you the possibility 
to use those beatiful @Transactional annotations, instead of handling 
each transaction programmatically. It's quite easy to implement, 
although there still is the hard part of rolling back unsucessful 
transactions.

> e) Lock-free concurrent updates
> No. I think some way of declaring the previously seen state would be
> necessary to achieve this. But again, I'm not sure that
> whole-Fedora-object-locking at a higher level is such a bad thing if
> it's done correctly and doesn't make the single-node-Fedora assumption
> that the locking in DOManager does today.
Yes as you might have guessed from the previous paragraph i think object 
locking would be hughly benefitial in the context of asynchronous writes 
or a federation of fedora.

> f) Storing entire object in self-contained file archives
> Yes. Although fcrepo-store does split the storage of FedoraObjects and
> managed content, having them stored together (e.g. in AtomZIP) at the
> low level is still possible. It's a question of efficiency.
Hmm i think that's a quite interesting idea, having all your AIPs as on 
the filesytem in one file, especially when thinking about integrating 
fedora with some kind of execution service which requests/updates a lot 
of objects from/in fedora. You could dramatically decrease load if the 
whole intellectual entity could be fetched from the repo in one request, 
with all it's represeantations, instead of requesting an object first 
and having one subsequent request per datastream.

-- 
*frank asseg*
softwareentwicklung
feichtmayrstr. 37
76646 bruchsal
tel.: ++49-7251-322-6073
fax.: ++49-7251-322-6078
mail: frank.as...@congrace.de
web: http://www.congrace.de/


------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to