Hi all,

While considering the design of a fairly low-level generic batch
utility for Fedora, I started to put together a new interface called
"FedoraStore" that looked similar to what we've been talking about for
High Level Storage[1] with 4.0.

http://cwilper.github.com/fcrepo-store/apidocs/com/github/cwilper/fcrepo/store/core/FedoraStore.html
(the AkubraFedoraStore impl is compatible with Fedora 3.2+'s
AkubraLowlevelStorage)

The main purpose of this is to aid in the writing of a generic batch
modify/migrate utility that works with current versions of Fedora. But
being Actual Working Code (tm), I thought it could also serve as a
good subject for discussion to get a better understanding of what we
really want 4.0's storage abstraction to look like.

So here's a provocative question to start: Assuming for a moment that
the core Fedora object model (versioning warts and all) stays the same
for 4.0, would something like this interface actually be compatible
with the major objectives we've talked about with respect to High
Level Storage?

a) Clustered Fedora instances.
Yes. As with the old LLStorage interface, this puts all of the storage
of Fedora objects and managed datastreams behind a single interface.
In either case, the actual underlying storage can be clustered itself
(GlusterFS, etc) -- it's really higher level code (caching,
locking…any kind of state sharing) that will have the final say as to
whether clustering is doable. Side note: Hazelcast looks like it could
be really nice for this.

b) Asynchronous Reads & Writes.
Potentially. In previous discussions we've talked about HLStorage
having a Result return type for each storage method as a way to easily
pass back some sort of token or other information to the caller so it
can check on the status (or associate a future message) with a
particular async read or write request. It seems likely to me that
some other form of association could be done, but I haven't thought it
through much.

c) Transactions.
Unsure. But I think it's worth stepping back and considering the
cost/benefit of implementing true ACID transactions across Fedora's
API for Fedora 4. I know the discussion of HLStorage has touched on
the possibility of doing this in the past, but it's been very short on
detail. Now, if we could assume that all Fedora state was persisted in
a relational database, this would be a non-issue, but we have managed
content. (I'm assuming for the moment, as previously discussed, that
RISearch and FieldSearch are outside the "core" for 4.0 and therefore
would be not be updated as part of the transaction) What's more, there
continues to be demand for a Fedora that can cope with asynchronous
reads and writes. As in, "the tape robot is going to take a minute to
spin up for that content, please stand by". Or "okay, i'll write that
to the storage cluster in a few minutes; it's super busy right now" It
seems to me that the absolute easiest way to get transactions with 4.0
would be to discontinue support of managed content and require a
relational database for FOXML (Hey, it was worth mentioning). In any
case, I'm not sure whether transaction semantics would actually need
to be exposed in the storage API at all…I hope not.

d) Storage Multiplexing.
Yes. As discussed in the original HLStorage paper, having the object
in context at the time that managed datastreams are being persisted
would make it easier to provide the necessary info (e.g. akubra
"hints") to the underlying impl.

e) Lock-free concurrent updates
No. I think some way of declaring the previously seen state would be
necessary to achieve this. But again, I'm not sure that
whole-Fedora-object-locking at a higher level is such a bad thing if
it's done correctly and doesn't make the single-node-Fedora assumption
that the locking in DOManager does today.

f) Storing entire object in self-contained file archives
Yes. Although fcrepo-store does split the storage of FedoraObjects and
managed content, having them stored together (e.g. in AtomZIP) at the
low level is still possible. It's a question of efficiency.

- Chris

[1] https://wiki.duraspace.org/display/FCREPO/High+Level+Storage

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to