Hi all, While considering the design of a fairly low-level generic batch utility for Fedora, I started to put together a new interface called "FedoraStore" that looked similar to what we've been talking about for High Level Storage[1] with 4.0.
http://cwilper.github.com/fcrepo-store/apidocs/com/github/cwilper/fcrepo/store/core/FedoraStore.html (the AkubraFedoraStore impl is compatible with Fedora 3.2+'s AkubraLowlevelStorage) The main purpose of this is to aid in the writing of a generic batch modify/migrate utility that works with current versions of Fedora. But being Actual Working Code (tm), I thought it could also serve as a good subject for discussion to get a better understanding of what we really want 4.0's storage abstraction to look like. So here's a provocative question to start: Assuming for a moment that the core Fedora object model (versioning warts and all) stays the same for 4.0, would something like this interface actually be compatible with the major objectives we've talked about with respect to High Level Storage? a) Clustered Fedora instances. Yes. As with the old LLStorage interface, this puts all of the storage of Fedora objects and managed datastreams behind a single interface. In either case, the actual underlying storage can be clustered itself (GlusterFS, etc) -- it's really higher level code (caching, locking…any kind of state sharing) that will have the final say as to whether clustering is doable. Side note: Hazelcast looks like it could be really nice for this. b) Asynchronous Reads & Writes. Potentially. In previous discussions we've talked about HLStorage having a Result return type for each storage method as a way to easily pass back some sort of token or other information to the caller so it can check on the status (or associate a future message) with a particular async read or write request. It seems likely to me that some other form of association could be done, but I haven't thought it through much. c) Transactions. Unsure. But I think it's worth stepping back and considering the cost/benefit of implementing true ACID transactions across Fedora's API for Fedora 4. I know the discussion of HLStorage has touched on the possibility of doing this in the past, but it's been very short on detail. Now, if we could assume that all Fedora state was persisted in a relational database, this would be a non-issue, but we have managed content. (I'm assuming for the moment, as previously discussed, that RISearch and FieldSearch are outside the "core" for 4.0 and therefore would be not be updated as part of the transaction) What's more, there continues to be demand for a Fedora that can cope with asynchronous reads and writes. As in, "the tape robot is going to take a minute to spin up for that content, please stand by". Or "okay, i'll write that to the storage cluster in a few minutes; it's super busy right now" It seems to me that the absolute easiest way to get transactions with 4.0 would be to discontinue support of managed content and require a relational database for FOXML (Hey, it was worth mentioning). In any case, I'm not sure whether transaction semantics would actually need to be exposed in the storage API at all…I hope not. d) Storage Multiplexing. Yes. As discussed in the original HLStorage paper, having the object in context at the time that managed datastreams are being persisted would make it easier to provide the necessary info (e.g. akubra "hints") to the underlying impl. e) Lock-free concurrent updates No. I think some way of declaring the previously seen state would be necessary to achieve this. But again, I'm not sure that whole-Fedora-object-locking at a higher level is such a bad thing if it's done correctly and doesn't make the single-node-Fedora assumption that the locking in DOManager does today. f) Storing entire object in self-contained file archives Yes. Although fcrepo-store does split the storage of FedoraObjects and managed content, having them stored together (e.g. in AtomZIP) at the low level is still possible. It's a question of efficiency. - Chris [1] https://wiki.duraspace.org/display/FCREPO/High+Level+Storage ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Fedora-commons-developers mailing list Fedora-commons-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers