-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Some of what I'm about to write is in response to Aaron's message below, and some is in response to the discussion we just had on the committers' call this morning about these issues. I hope I can manage to be coherent. {grin}
It seems to me now that the discussion we're having now about atomicity and transactionality includes the older discussion around High Level Storage as a key use case. The general questions are about where and how to introduce these notions and how to expose them out of Fedora's code. High Level Storage (object-level atomicity) is an obvious package of answers to those general questions, but not the only one of interest. Dan Davis made a point today on the call that he's made in the past-- we should be aware of two ends of a spectrum of use in which these notions are interesting. On one hand, intensive authoring, in which transactionality is very important (it's important that you have a clear and coherent idea about the resources you're manipulating). On the other, intensive publishing, in which transactionality is not so interesting and it weighs against the more important value of a low marginal cost for access. (Those aren't precisely the terms he used, but I hope they capture his meaning.) Ideally, we want to develop our approach to support the whole spectrum. Ideally, we even want to do that in such a way as to allow individual sites to select their own basket of costs and values. Building on Aaron's discussion of the motivation of HLS below (and if I take his meaning clearly), I'll suggest that the answers it provides for the general questions is one that gives the eventual user a place on Dan's spectrum towards the "intensive publishing" end. By pushing the notion of atomicity up the code hierarchy (and thus up the hierarchy of notions of content that Fedora contemplates), we discard very fine-grained guarantees about consistent access (except insofar as they are reaffirmed by the underlying storage provider outside of Fedora's remit) and acquire greater availability of higher-level forms of information. Perhaps we can define a few other packages of answers that meet the spectrum at other points. I think that if we can do that, we will start to see some of the implicit architectural constraints for the future development become apparent. For example, what if we were to push the atomic unit farther down into the system, to the level of the datastream and object metadata? Fedora itself would be able to offer very fine-grained guarantees about consistency and we would incur some real expenses that would have to be passed on, eventually, to users. As a thought experiment, are there any intermediate levels? Is there any way we could find a generally-applicable atomic size between the object and its components? (For example, when content datastreams should be altered together but metadata datastreams needn't be.) What kinds of tools can we employ to choose different sizes of atom? On today's call, Frank Asseg discussed some of the work he's described on the mailing list and was very complimentary towards Spring's ability to describe transaction boundaries as part of bean wiring. That would be a more-or-less repository-wide way to adjust the size of an atom, and maybe (along with the corresponding changes to the code to respond to the Spring wiring) that's all we need-- the ability for a given site to decide whether there are transaction boundaries, and if so, whether they are object-level or datastream/object metadata-level. As another thought experiment, should we consider the possibility that some content in a repository might deserve one notion of atomicity and other content might deserve a different notion? For example, could we use RDF relationships to declare which parts of an object must be handled together as an atom? (Sounds crazy, I know, but perhaps no crazier than using RDF to express authorization context, which we now happily do, thanks to FESL.) - --- A. Soroka Software & Systems Engineering :: Online Library Environment the University of Virginia Library On Mar 29, 2012, at 9:16 AM, Aaron Birkland wrote: > >> So here's a provocative question to start: Assuming for a moment that >> the core Fedora object model (versioning warts and all) stays the same >> for 4.0, would something like this interface actually be compatible >> with the major objectives we've talked about with respect to High >> Level Storage? > > Here's my perspective: > > HighlevelStorage was designed as a data-oriented interface that > explicitly made the fedora object a fundamental and atomic unit of work > with respect to storage and associated "data-oriented" services that > might be plugged in. This was a key simplification with clear > boundaries that would enable storage implementations the flexibility to > adopt a variety of locking, optimization, and/or communication > strategies within each unit of work - as it is guaranteed that each unit > of work is "complete" and fully defined with respect to a single fedora > object. Transactions could later be laid on top of that, but would not > change the fact that each individual operation within a transaction > would be a complete-object-version unit of work. > > setContent() could possibly be problematic in that light, I'm not sure. > For example, one potential use case of HighLevelStorage is that the > storage impl might decide a managed datastream's physical storage > location based upon some property of the object (content model, for > example). Do the semantics of setContent() allow a FedoraStore impl to > "make note that some content is available, hold onto a reference to the > InputStreams, but only act upon it in response to update(), possibly > making storage decisions based upon the content of the FedoraObject"? > > While I don't consider lock-free concurrent updates to be fundamental to > HighLevelStorage per se, the interface was designed to explicitly > declare a handle to prior state in order to provide flexibility and > avoid the need for explicit locking and shared-state. Forcing the use > of internal or external locks and/or transactions limits the opportunity > to leverage certain kinds of horizontal scalability. Indeed, the > initial motivation for HighLevelStorage for me was to horizontally-scale > fedora itself by eliminating shared state and locking between instances, > utilizing only the native capabilities of the storage impl (in this case > HBase). With the FedoraStore interface as it stands right now, locking > (or single-object transactions) *must* be used in order to create fairly > lengthy critical section, making such horizontal scaling more > complicated and less effective. > > Used in the same place as ILowlevelStorage, providing a reference to the > "to-be-replaced" version upon update is a fairly natural thing to do. > DOManager would need to retrieve the old version of an object anyway in > order to correctly populate the updated version, so there really is no > additional overhead in supplying a reference to it to the storage impl. > In fact, having a reference to both versions of the object may even make > certain implementations of HighLevelStorage plugins more efficient. > Consider a plugin that calculates the diff of triples to send off for > indexing. It would be handy to have the metadata of the old version > right there in order to be able to dereference the proper datastream for > comparison, especially if that datastream is not versionable. > > -Aaron > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Fedora-commons-developers mailing list > Fedora-commons-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJPdHjoAAoJEATpPYSyaoIkld4H/3YpYqXNEnHqfDrLfuJVYVZq Il05VkM4uRAlImYnht19C2tWNjHsFivQimsNKnh7u3rTC7fxoxMstGjMFU2aokfM JrZYkXBOM5QuIbMH39XlymhlQDdjxd063zjz2E4D4+J97kLKjJbB4kJh2BfYY4yw 99R2j+7uJqWuZ1juDLfRg1sesLR2OjJ6xmivwIrgFzRq4i2bfq3cP+lXO8/PtUC7 i9lOU+JjkfKzCMYTsAUDrrTUnDw3D4iIyrJwfUvf2BYPXQIozGNDlD4uv4Jvc9XH v1Oz2y8N7WqNxi6ZU0ds1NBmKMWOoWmPymNpfF/0t4udUO3gNsxNWdysrE0pA44= =Q+NG -----END PGP SIGNATURE----- ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Fedora-commons-developers mailing list Fedora-commons-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers