Hi Aaron
On Wed, 2009-10-28 at 16:36 +0100, Aaron Birkland wrote:
the implications of these
fine-granied operations on datastream versions is unclear.
You are quite right. I have simply not thought of the number of
versions
that would be created from doing these operations in a series.
I think that executing multiple operations in series as part of a
single
logical change that would otherwise be preserved as a unit might be
an
anti-pattern in general (some of these preserved versions would
necessarily be inconsistent with the desired end state).
Yes, that would be an anti-pattern.
Each operation needs to do versioning, as if it was singular, as it
can
have no knowledge of the logical change, and thus the unit to
preserve.
I have an idea for a transactional system for Fedora, that I would
like
to hear your opinion on.
There will be 3 new methods, like StartTransaction, CommitTransaction
and DeleteTransaction. StartTransaction gives you a token, that
identifies the transaction. All the normal API methods will take that
token as a parameter, and execute the operation within the
transaction.
Normal operation of Fedora will work as normal.
When an object is modified as part of a transaction, the normal
procedure for an API call is not followed.
1. First, the Fedora system attempt to get a write-lock on the object.
The object is being written as part of this transaction, and does not
allow other processes to edit it.
2. The object is parsed into memory, and stored as part of the
transaction.
3. The change is executed on the in-memory object, preserving
information about which new datastream is created.
4. Return
Normal reads of the object will see the unmodified object. Reads from
within the transaction will see the modified object.
Further modifications will either lock other objects, or work on one
of
the already locked objects. The interesting case is when an object is
modified twice in the same transaction.
1. In case the change involves a unversioned property (unversioned
datastream, object property), the change should just overwrite the
previous value, even if that was set as part of this transaction.
2. In case the change involves something versioned, but the unit has
not
been modified in this transaction: Make a new version as normal, with
the change.
3. In case the change involves something versioned, and this thing has
already been modified as part of this transaction: Find the new
version
created, and replace the values in that.
4. In case the change involves something deleted in the same
transaction: This cause an error, and the change is not carried out.
The procedure above would ensure that changes to the same logical unit
would be made part of the same storage version.
When the change is committed:
First, the system upgrades all write locks on modified objects to read
locks. The locked objects are parsed into memory, and used to service
read requests while the transaction is written to storage.
The objects are written to the store, one at a time (as that is the
only
way to do so). If there is a problem with writing one of the objects,
the transaction is aborted, and all objects written are replaced with
their previously parsed counterparts.
(This is the risky step. If, and only if, the fedora system goes down
while committing a transaction, will the repository be left in an
inconsistent state.)
When all the modifications have been written, the old data objects are
cleared from memory and the locks on the objects released.
When a transaction is deleted:
Remove the transaction object, and all the modified objects. These
exist
only in memory at that moment, so this change will be invisible to the
storage system.
Risks at different stages
1. The client goes down during a transaction, thus locking some
objects:
Timeout on the transaction remedies this problem
2. The server goes down during a transaction, before commit is called:
The transaction is stored only in memory (or similar non-persistent
storage) so all recollection of the transaction is only relevant while
the server is running. Reboot removes all transactions.
3. The server goes down during a commit: The repo is left
inconsistent.
Easiest way to mitigate: Write the old versions of the objects to some
more permanent store before changing the versions in the repository.
When the server starts up, restore any objects from this store, so
that
the finished repo is consistent again.
Gotchas in this approach:
1. The triple store will not be transactional. It will reflect the
current contents of the repo. We can, as the last part of a
transaction,
ingest all the rdf statements from the changed objects into the triple
store, so that it gets them all at the same time. Still, there will
be a
tiny desynchronisation between fedora and triple store.
2. Content from remote locations (ie. URLs) will not be downloaded
until
the transaction is committed. That is the only part that cannot be
reasonably validated, before the attempt is made.
However, in my
own use case, I would say that 90-99% of all datastream
modifications do
involve only *one* change - and in that case, this API proposal fits
very well by providing an additional, lightweight, easy to use tool.
I am glad to hear that. For a moment, I thought that the problem you
pointed out would be the death of this API.
Some of my most common use cases are:
1) updating datastream content, keeping all properties the same
2) changing datastream state, keeping content and other properties
the
same
3) fixing datastream MIME type, keeping content and other
properties the
same
A less common (but important) use case for me involving changes to
both
content and properties is updating content, label, and mime type. I
would want to to perform that change one atomic operation.
I do not propose to change things without creating an Audit entry.
What
made you think I meant that?
I was not sure if the granularity of these operations had any
implications on auditing.
I do have a problem with the Audit system at the moment, as it does
not
store the old values. This is not good when changing properties that
are
not part of the versioned bit of a datastream.
If a client intends to modify
both datastream content as well as datastream properties, does this
imply that it MUST first change datastream content, then change
properties?
Why should the order matter?
You're right - it does not matter. It may have implications on what
the inconsistent intermediate versions would look like, but does not
present a fundamental difference.
This has been a delightful topic, Asger. Thanks!
Thanks!
-Aaron
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart
your
developing skills, take BlackBerry mobile applications to market and
stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers