> I have an idea for a transactional system for Fedora, that I would like
> to hear your opinion on.

I had some time to look it over (comments inline).  First, some
background:  Transactions have been a topic discussed for many years.
At a fedora architecture summit in early 2007, support for transactions
was identified as a desirable feature - but that topic is not without
controversy.  There is a related JIRA item you should be aware of:

https://fedora-commons.org/jira/browse/FCREPO-435

The discussion attached with that issue gives some history.  It is
interesting to note that this discussion first started with the idea of
a "super method": a single method containing all data/operations.   The
issue has since evolved, but I do not believe there is any consensus for
an approach.  Your idea essentially takes the approach of exposing
Fedora as a transactional resource.

In my own view, if Fedora offered transactions over current operations,
I would try to avoid using them for multiple operations unless they are
required by the application at hand, or if the solution would be
needlessly complex otherwise.   Transactions would be a useful tool when
needed, but I would assume that using transactions would come at a cost.

> 1. First, the Fedora system attempt to get a write-lock on the object.
> The object is being written as part of this transaction, and does not
> allow other processes to edit it.

There would be a risk of deadlocks here if processes try to obtain locks
for multiple objects in sequence.  The implementation would need to pick
a strategy for avoiding or detecting deadlocks

> 2. The object is parsed into memory, and stored as part of the
> transaction.
> 3. The change is executed on the in-memory object, preserving
> information about which new datastream is created.

I don't think all these details are essential to your proposal.  Rather
than in memory, you may find that it is best to serialize/store objects
in temporary files, a blob store, etc.  It is not even strictly
necessary to separately track changes - a comparison with the original
object might suffice.   Large managed datastream uploads would almost
certainly require storing some content in temporary files (Fedora does
that now).

> 4. In case the change involves something deleted in the same
> transaction: This cause an error, and the change is not carried out.

Would this abort the transaction, or just the current operation? 


> When the change is committed:
> First, the system upgrades all write locks on modified objects to read
> locks. The locked objects are parsed into memory, and used to service
> read requests while the transaction is written to storage. 

I don't know if this would be strictly necessary.  This proposal has an
inherent visibility risk.  If a commit fails, than all objects in the
transaction will (eventually) cease being visible.  However, there was 
a (brief) period of time in which they were visible.  If that remains true,
then having objects become visible as they are processed during a commit
may not be any worse.  

> (This is the risky step. If, and only if, the fedora system goes down
> while committing a transaction, will the repository be left in an
> inconsistent state.)

There are ways around that (such as write-ahead logs, etc).  If a client
has to live with this uncertainty (i.e SOME objects may not have
changed, but you have no way of knowing that without checking), the
transactional system is of little or no value.  Thus, the repository
MUST be able to recover from a crash during write.  Ideally, it would be
able to finish the transaction, but aborting the transaction and undoing
all changes might be acceptable as well (I believe you propose that
later on in the message). 

> 2. The server goes down during a transaction, before commit is called:
> The transaction is stored only in memory (or similar non-persistent
> storage) so all recollection of the transaction is only relevant while
> the server is running. Reboot removes all transactions.

Again, I believe in-memory objects is a non-essential implementation
detail.  Perhaps temporary files are cleaned up, or in-progress data is
removed from some form of storage.

> 3. The server goes down during a commit: The repo is left inconsistent.
> Easiest way to mitigate: Write the old versions of the objects to some
> more permanent store before changing the versions in the repository.
> When the server starts up, restore any objects from this store, so that
> the finished repo is consistent again.

You could go in either direction: clean up after a failed commit, or to
finish the commit.  Completing the commit would be possible if the
in-transaction objects are stored as files, or a WAL is maintained.  I
would tend to prefer finishing the commit.  For example, changes to an
object can never appear to "go away" if the commit needs to be rolled
back.

> Gotchas in this approach:
> 1. The triple store will not be transactional. It will reflect the
> current contents of the repo. We can, as the last part of a transaction,
> ingest all the rdf statements from the changed objects into the triple
> store, so that it gets them all at the same time. Still, there will be a
> tiny desynchronisation between fedora and triple store.

In theory, the triple store could be a transactional resource, but
that's probably not the best approach.  It would be safest to assume
that most infrastructure surrounding Fedora is non-transactional, and
that it will not see the change-set in an atomic manner.  

Consider an application listening for messages via JMS.  Would it be
possible/practical/desirable, to summarize all objects affected by a
transaction in a single JMS message?  What if we assume that a
transaction involving multiple objects will result in multiple JMS
messages, sent serially?  This aspect would require some thought.

  -Aaron



------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to