Asger, Thanks for taking this on, and thanks for writing up this proposal. It's really important to keep refining this API and it helps to have a document to work from. Here are a few thoughts based on the proposal and the comments so far.

I propose a fundamental design guideline for this API: Fedora should be responsible for exposing complete CRUD endpoints for its resources. The REST API should use HTTP structures whenever possible. Keep it clean and predictable. Leave the rest to the client libraries.

# Serializations, rather than changing URLs, are the key

If you really want to make it easier for lightweight apps to work with Fedora object properties, support JSON for the full CRUD cycle. Heck, support 10 different serializations (XML, RDF/XML, JSON, n-triples, Ruby, Python ...). That would be awesome. It follows the existing pattern of encouraging 3rd-party developers by supporting serializations in their native languages.

If you have to choose one serialization to start with, it's JSON. Hands down. Any language can parse it, including Javascript (of course), meaning that you can inject JSON content anywhere in your stack, from the bottom layer of a client application to a dynamic lookup in the browser. That's real flexibility.

# Is this an extension or a replacement?

Some of the comments on the list imply that the proposed changes will conflict with the existing API, but looking lightly through the document in confluence, it appears that you've designed this as an extension of the existing API . Which is it? Of the methods that already exist, which ones (if any) are you proposing to change and which are you planning to leave alone?

For example, will you prevent people from continuing to use the "big" methods when they want to, or will you simply add the option of using the new finer-grained methods?

# Relationships

It will be particularly nice to have the relationship methods added to the REST api. Many thanks for pushing on this. We weren't able to put them in the original version of the api in late 2007 because the underlying Java code didn't exist yet. I've been hoping to fix it since then.

I'm confused by some of the conversation about the method signatures for these methods. With all of the challenges you're running into WRT translating RDF into a URL, I think it makes sense to provide the failover of allowing people the option of POSTing rdf/xml and n- triples as the body of their request. That is what POST content is for and, as far as I know, "URL" is not a standard serialization for RDF. Remember, REST is about leveraging all of the elegant power of HTTP, not just using nicer URLs. It makes sense to allow people to use the standard REST/HTTP approach and a standard RDF serialization to update RDF in Fedora.

Example Case for URL syntax: I have a lightweight app that doesn't know anything about RDF. I just want to assert that X isMemberOf Y. I use the URL syntax, with PIDs as subject and object and simplified versions of Fedora Relationship predicates to make this assertion.

Example Case for POST content: I already have the desired relationship represented as RDF in my code. I just want to push it into Fedora. I push it to /objects/{PID}/relationships as RDF/XML. Alternatively, I can still go straight to the REL-EXT datastream and edit the RDF/XML directly.

I think both of these cases are important and I think it's possible to support both.

# Content Disposition

After the relationship methods, my biggest hope is for /datastreams/ {dsid}/content to start using the content-disposition header (see http://fedoracommons.org/jira/browse/FCREPO-497 ), a feature that was originally suggested by Steve Bayliss. This would put an end to datastreams being downloaded with the absurd filename of "content". It's a little thing, but it has a huge impact on user experience.

There are a number of tickets in Jira around using HTTP more fully in the REST api. (i.e.. http://fedoracommons.org/jira/browse/FCREPO-182 and http://fedoracommons.org/jira/browse/FCREPO-412 ). Please keep these in mind for the re-working of the api. (This might go without saying. Your name comes up a few times in the tickets.)

# Properties are not Resources

Regarding exposing the properties at distinct urls, to put it simply this breaks the REST model. Properties are not resources; they are properties of a resource. If I go to /objects/{PID}, I should get a _representation_ of that object -- a resource -- consisting of its properties. I should not have to go to different resources in order to CRUD those properties.

An example:

If you took the current design to its logical end, the DC datastream would have resources like /title, /creator, /publisher, etc hanging off of its URL. While this may sound appealing to some, it doesn't actually help anyone to make better applications and it certainly doesn't fit the REST model. When you're reading the resource, this information belongs in the base resource's HTTP content response, ideally with a couple of serialization options (XML, HTTP, JSON, RDF/ XML, N-triples, etc). When you're updating it, you should be able to pass the values to the resource as either url params or http content. Either way, exposing individual properties as resources simply complicates the task of writing a library to consume the api.




On Oct 28, 2009, at 11:33 AM, Asger Askov Blekinge wrote:

Hi Aaron



On Wed, 2009-10-28 at 16:36 +0100, Aaron Birkland wrote:
the implications of these
fine-granied operations on datastream versions is unclear.

You are quite right. I have simply not thought of the number of versions
that would be created from doing these operations in a series.

I think that executing multiple operations in series as part of a single logical change that would otherwise be preserved as a unit might be an
anti-pattern in general (some of these preserved versions would
necessarily be inconsistent with the desired end state).
Yes, that would be an anti-pattern.

Each operation needs to do versioning, as if it was singular, as it can have no knowledge of the logical change, and thus the unit to preserve.

I have an idea for a transactional system for Fedora, that I would like
to hear your opinion on.


There will be 3 new methods, like StartTransaction, CommitTransaction
and DeleteTransaction. StartTransaction gives you a token, that
identifies the transaction. All the normal API methods will take that
token as a parameter, and execute the operation within the transaction.
Normal operation of Fedora will work as normal.

When an object is modified as part of a transaction, the normal
procedure for an API call is not followed.

1. First, the Fedora system attempt to get a write-lock on the object.
The object is being written as part of this transaction, and does not
allow other processes to edit it.
2. The object is parsed into memory, and stored as part of the
transaction.
3. The change is executed on the in-memory object, preserving
information about which new datastream is created.
4. Return

Normal reads of the object will see the unmodified object. Reads from
within the transaction will see the modified object.

Further modifications will either lock other objects, or work on one of
the already locked objects. The interesting case is when an object is
modified twice in the same transaction.

1. In case the change involves a unversioned property (unversioned
datastream, object property), the change should just overwrite the
previous value, even if that was set as part of this transaction.
2. In case the change involves something versioned, but the unit has not
been modified in this transaction: Make a new version as normal, with
the change.
3. In case the change involves something versioned, and this thing has
already been modified as part of this transaction: Find the new version
created, and replace the values in that.
4. In case the change involves something deleted in the same
transaction: This cause an error, and the change is not carried out.

The procedure above would ensure that changes to the same logical unit
would be made part of the same storage version.


When the change is committed:
First, the system upgrades all write locks on modified objects to read
locks. The locked objects are parsed into memory, and used to service
read requests while the transaction is written to storage.
The objects are written to the store, one at a time (as that is the only
way to do so). If there is a problem with writing one of the objects,
the transaction is aborted, and all objects written are replaced with
their previously parsed counterparts.
(This is the risky step. If, and only if, the fedora system goes down
while committing a transaction, will the repository be left in an
inconsistent state.)
When all the modifications have been written, the old data objects are
cleared from memory and the locks on the objects released.

When a transaction is deleted:
Remove the transaction object, and all the modified objects. These exist
only in memory at that moment, so this change will be invisible to the
storage system.


Risks at different stages
1. The client goes down during a transaction, thus locking some objects:
Timeout on the transaction remedies this problem
2. The server goes down during a transaction, before commit is called:
The transaction is stored only in memory (or similar non-persistent
storage) so all recollection of the transaction is only relevant while
the server is running. Reboot removes all transactions.
3. The server goes down during a commit: The repo is left inconsistent.
Easiest way to mitigate: Write the old versions of the objects to some
more permanent store before changing the versions in the repository.
When the server starts up, restore any objects from this store, so that
the finished repo is consistent again.


Gotchas in this approach:
1. The triple store will not be transactional. It will reflect the
current contents of the repo. We can, as the last part of a transaction,
ingest all the rdf statements from the changed objects into the triple
store, so that it gets them all at the same time. Still, there will be a
tiny desynchronisation between fedora and triple store.
2. Content from remote locations (ie. URLs) will not be downloaded until
the transaction is committed. That is the only part that cannot be
reasonably validated, before the attempt is made.










 However, in my
own use case, I would say that 90-99% of all datastream modifications do
involve only *one* change - and in that case, this API proposal fits
very well by providing an additional, lightweight, easy to use tool.
I am glad to hear that. For a moment, I thought that the problem you
pointed out would be the death of this API.





Some of my most common use cases are:
1) updating datastream content, keeping all properties the same
2) changing datastream state, keeping content and other properties the
same
3) fixing datastream MIME type, keeping content and other properties the
same

A less common (but important) use case for me involving changes to both
content and properties is updating content, label, and mime type.   I
would want to to perform that change one atomic operation.


I do not propose to change things without creating an Audit entry. What
made you think I meant that?

I was not sure if the granularity of these operations had any
implications on auditing.
I do have a problem with the Audit system at the moment, as it does not store the old values. This is not good when changing properties that are
not part of the versioned bit of a datastream.





If a client intends to modify
both datastream content as well as datastream properties, does this
imply that it MUST first change datastream content, then change
properties?
Why should the order matter?

You're right -  it does not matter.  It may have implications on what
the inconsistent intermediate versions would look like, but does not
present a fundamental difference.

This has been a delightful topic, Asger.  Thanks!
Thanks!


 -Aaron





------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to