Hi Aaron


On Wed, 2009-10-28 at 16:36 +0100, Aaron Birkland wrote:
> > > the implications of these
> > > fine-granied operations on datastream versions is unclear.
> > 
> > You are quite right. I have simply not thought of the number of versions
> > that would be created from doing these operations in a series.
> 
> I think that executing multiple operations in series as part of a single
> logical change that would otherwise be preserved as a unit might be an
> anti-pattern in general (some of these preserved versions would
> necessarily be inconsistent with the desired end state).
Yes, that would be an anti-pattern. 

Each operation needs to do versioning, as if it was singular, as it can
have no knowledge of the logical change, and thus the unit to preserve.

I have an idea for a transactional system for Fedora, that I would like
to hear your opinion on.


There will be 3 new methods, like StartTransaction, CommitTransaction
and DeleteTransaction. StartTransaction gives you a token, that
identifies the transaction. All the normal API methods will take that
token as a parameter, and execute the operation within the transaction.
Normal operation of Fedora will work as normal.

When an object is modified as part of a transaction, the normal
procedure for an API call is not followed.

1. First, the Fedora system attempt to get a write-lock on the object.
The object is being written as part of this transaction, and does not
allow other processes to edit it.
2. The object is parsed into memory, and stored as part of the
transaction.
3. The change is executed on the in-memory object, preserving
information about which new datastream is created.
4. Return

Normal reads of the object will see the unmodified object. Reads from
within the transaction will see the modified object.

Further modifications will either lock other objects, or work on one of
the already locked objects. The interesting case is when an object is
modified twice in the same transaction.

1. In case the change involves a unversioned property (unversioned
datastream, object property), the change should just overwrite the
previous value, even if that was set as part of this transaction.
2. In case the change involves something versioned, but the unit has not
been modified in this transaction: Make a new version as normal, with
the change.
3. In case the change involves something versioned, and this thing has
already been modified as part of this transaction: Find the new version
created, and replace the values in that. 
4. In case the change involves something deleted in the same
transaction: This cause an error, and the change is not carried out.

The procedure above would ensure that changes to the same logical unit
would be made part of the same storage version.


When the change is committed:
First, the system upgrades all write locks on modified objects to read
locks. The locked objects are parsed into memory, and used to service
read requests while the transaction is written to storage. 
The objects are written to the store, one at a time (as that is the only
way to do so). If there is a problem with writing one of the objects,
the transaction is aborted, and all objects written are replaced with
their previously parsed counterparts.
(This is the risky step. If, and only if, the fedora system goes down
while committing a transaction, will the repository be left in an
inconsistent state.)
When all the modifications have been written, the old data objects are
cleared from memory and the locks on the objects released.

When a transaction is deleted:
Remove the transaction object, and all the modified objects. These exist
only in memory at that moment, so this change will be invisible to the
storage system.


Risks at different stages
1. The client goes down during a transaction, thus locking some objects:
Timeout on the transaction remedies this problem
2. The server goes down during a transaction, before commit is called:
The transaction is stored only in memory (or similar non-persistent
storage) so all recollection of the transaction is only relevant while
the server is running. Reboot removes all transactions.
3. The server goes down during a commit: The repo is left inconsistent.
Easiest way to mitigate: Write the old versions of the objects to some
more permanent store before changing the versions in the repository.
When the server starts up, restore any objects from this store, so that
the finished repo is consistent again.


Gotchas in this approach:
1. The triple store will not be transactional. It will reflect the
current contents of the repo. We can, as the last part of a transaction,
ingest all the rdf statements from the changed objects into the triple
store, so that it gets them all at the same time. Still, there will be a
tiny desynchronisation between fedora and triple store.
2. Content from remote locations (ie. URLs) will not be downloaded until
the transaction is committed. That is the only part that cannot be
reasonably validated, before the attempt is made.










>   However, in my
> own use case, I would say that 90-99% of all datastream modifications do
> involve only *one* change - and in that case, this API proposal fits
> very well by providing an additional, lightweight, easy to use tool.
I am glad to hear that. For a moment, I thought that the problem you
pointed out would be the death of this API.




> 
> Some of my most common use cases are:
> 1) updating datastream content, keeping all properties the same
> 2) changing datastream state, keeping content and other properties the
> same
> 3) fixing datastream MIME type, keeping content and other properties the
> same
> 
> A less common (but important) use case for me involving changes to both
> content and properties is updating content, label, and mime type.   I
> would want to to perform that change one atomic operation.

> 
> > I do not propose to change things without creating an Audit entry. What
> > made you think I meant that?
> 
> I was not sure if the granularity of these operations had any
> implications on auditing.  
I do have a problem with the Audit system at the moment, as it does not
store the old values. This is not good when changing properties that are
not part of the versioned bit of a datastream. 




> 
> > >  If a client intends to modify
> > > both datastream content as well as datastream properties, does this
> > > imply that it MUST first change datastream content, then change
> > > properties?  
> > Why should the order matter?  
> 
> You're right -  it does not matter.  It may have implications on what
> the inconsistent intermediate versions would look like, but does not
> present a fundamental difference.
> 
> This has been a delightful topic, Asger.  Thanks!
Thanks!

> 
>   -Aaron
> 
> 
> 


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to