-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Some of what I'm about to write is in response to Aaron's message below, and 
some is in response to the discussion we just had on the committers' call this 
morning about these issues. I hope I can manage to be coherent. {grin}

It seems to me now that the discussion we're having now about atomicity and 
transactionality includes the older discussion around High Level Storage as a 
key use case. The general questions are about where and how to introduce these 
notions and how to expose them out of Fedora's code. High Level Storage 
(object-level atomicity) is an obvious package of answers to those general 
questions, but not the only one of interest.

Dan Davis made a point today on the call that he's made in the past-- we should 
be aware of two ends of a spectrum of use in which these notions are 
interesting. On one hand, intensive authoring, in which transactionality is 
very important (it's important that you have a clear  and coherent idea about 
the resources you're manipulating). On the other, intensive publishing, in 
which transactionality is not so interesting and it weighs against the more 
important value of a low marginal cost for access. (Those aren't precisely the 
terms he used, but I hope they capture his meaning.) Ideally, we want to 
develop our approach to support the whole spectrum. Ideally, we even want to do 
that in such a way as to allow individual sites to select their own basket of 
costs and values.

Building on Aaron's discussion of the motivation of HLS below (and if I take 
his meaning clearly), I'll suggest that the answers it provides for the general 
questions is one that gives the eventual user a place on Dan's spectrum towards 
the "intensive publishing" end. By pushing the notion of atomicity up the code 
hierarchy (and thus up the hierarchy of notions of content that Fedora 
contemplates), we discard very fine-grained guarantees about consistent access 
(except insofar as they are reaffirmed by the underlying storage provider 
outside of Fedora's remit) and acquire greater availability of higher-level 
forms of information.

Perhaps we can define a few other packages of answers that meet the spectrum at 
other points. I think that if we can do that, we will start to see some of the 
implicit architectural constraints for the future development become apparent. 
For example, what if we were to push the atomic unit farther down into the 
system, to the level of the datastream and object metadata? Fedora itself would 
be able to offer very fine-grained guarantees about consistency and we would 
incur some real expenses that would have to be passed on, eventually, to users. 
As a thought experiment, are there any intermediate levels? Is there any way we 
could find a generally-applicable atomic size between the object and its 
components? (For example, when content datastreams should be altered together 
but metadata datastreams needn't be.)

What kinds of tools can we employ to choose different sizes of atom? On today's 
call, Frank Asseg discussed some of the work he's described on the mailing list 
and was very complimentary towards Spring's ability to describe transaction 
boundaries as part of bean wiring. That would be a more-or-less repository-wide 
way to adjust the size of an atom, and maybe (along with the corresponding 
changes to the code to respond to the Spring wiring) that's all we need-- the 
ability for a given site to decide whether there are transaction boundaries, 
and if so, whether they are object-level or datastream/object metadata-level. 
As another thought experiment, should we consider the possibility that some 
content in a repository might deserve one notion of atomicity and other content 
might deserve a different notion? For example, could we use RDF relationships 
to declare which parts of an object must be handled together as an atom? 
(Sounds crazy, I know, but perhaps no crazier than using RDF to express 
authorization context, which we now happily do, thanks to FESL.)

- ---
A. Soroka
Software & Systems Engineering :: Online Library Environment
the University of Virginia Library

On Mar 29, 2012, at 9:16 AM, Aaron Birkland wrote:

> 
>> So here's a provocative question to start: Assuming for a moment that
>> the core Fedora object model (versioning warts and all) stays the same
>> for 4.0, would something like this interface actually be compatible
>> with the major objectives we've talked about with respect to High
>> Level Storage?
> 
> Here's my perspective:
> 
> HighlevelStorage was designed as a data-oriented interface that
> explicitly made the fedora object a fundamental and atomic unit of work
> with respect to storage and associated "data-oriented" services that
> might be plugged in.  This was a key simplification with clear
> boundaries that would enable storage implementations the flexibility to
> adopt a variety of locking, optimization, and/or communication
> strategies within each unit of work - as it is guaranteed that each unit
> of work is "complete" and fully defined with respect to a single fedora
> object.  Transactions could later be laid on top of that,  but would not
> change the fact that each individual operation within a transaction
> would be a complete-object-version unit of work.
> 
> setContent() could possibly be problematic in that light, I'm not sure.
> For example, one potential use case of HighLevelStorage is that the
> storage impl might decide a managed datastream's physical storage
> location based upon some property of the object (content model, for
> example).  Do the semantics of setContent() allow a FedoraStore impl to
> "make note that some content is available, hold onto a reference to the
> InputStreams, but only act upon it in response to update(), possibly
> making storage decisions based upon the content of the FedoraObject"?  
> 
> While I don't consider lock-free concurrent updates to be fundamental to
> HighLevelStorage per se, the interface was designed to explicitly
> declare a handle to prior state in order to provide flexibility and
> avoid the need for explicit locking and shared-state.   Forcing the use
> of internal or external locks and/or transactions limits the opportunity
> to leverage certain kinds of horizontal scalability.  Indeed, the
> initial motivation for HighLevelStorage for me was to horizontally-scale
> fedora itself by eliminating shared state and locking between instances,
> utilizing only the native capabilities of the storage impl (in this case
> HBase).   With the FedoraStore interface as it stands right now, locking
> (or single-object transactions) *must* be used in order to create fairly
> lengthy critical section, making such horizontal scaling more
> complicated and less effective.
> 
> Used in the same place as ILowlevelStorage, providing a reference to the
> "to-be-replaced" version upon update is a fairly natural thing to do.
> DOManager would need to retrieve the old version of an object anyway in
> order to correctly populate the updated version, so there really is no
> additional overhead in supplying a reference to it to the storage impl.
> In fact, having a reference to both versions of the object may even make
> certain implementations of HighLevelStorage plugins more efficient.
> Consider a plugin that calculates the diff of triples to send off for
> indexing.  It would be handy to have the metadata of the old version
> right there in order to be able to dereference the proper datastream for
> comparison, especially if that datastream is not versionable.
> 
>  -Aaron
> 
> 
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here 
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Fedora-commons-developers mailing list
> Fedora-commons-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJPdHjoAAoJEATpPYSyaoIkld4H/3YpYqXNEnHqfDrLfuJVYVZq
Il05VkM4uRAlImYnht19C2tWNjHsFivQimsNKnh7u3rTC7fxoxMstGjMFU2aokfM
JrZYkXBOM5QuIbMH39XlymhlQDdjxd063zjz2E4D4+J97kLKjJbB4kJh2BfYY4yw
99R2j+7uJqWuZ1juDLfRg1sesLR2OjJ6xmivwIrgFzRq4i2bfq3cP+lXO8/PtUC7
i9lOU+JjkfKzCMYTsAUDrrTUnDw3D4iIyrJwfUvf2BYPXQIozGNDlD4uv4Jvc9XH
v1Oz2y8N7WqNxi6ZU0ds1NBmKMWOoWmPymNpfF/0t4udUO3gNsxNWdysrE0pA44=
=Q+NG
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to