Re: [Fedora-commons-developers] Fedora REST interface, a proposal for methods

Matt Zumwalt Thu, 29 Oct 2009 10:55:54 -0700

Asger, Thanks for taking this on, and thanks for writing up thisproposal. It's really important to keep refining this API and ithelps to have a document to work from. Here are a few thoughts basedon the proposal and the comments so far.

I propose a fundamental design guideline for this API: Fedora shouldbe responsible for exposing complete CRUD endpoints for itsresources. The REST API should use HTTP structures wheneverpossible. Keep it clean and predictable. Leave the rest to theclient libraries.


# Serializations, rather than changing URLs, are the key

If you really want to make it easier for lightweight apps to work withFedora object properties, support JSON for the full CRUD cycle. Heck,support 10 different serializations (XML, RDF/XML, JSON, n-triples,Ruby, Python ...). That would be awesome. It follows the existingpattern of encouraging 3rd-party developers by supportingserializations in their native languages.

If you have to choose one serialization to start with, it's JSON.Hands down. Any language can parse it, including Javascript (ofcourse), meaning that you can inject JSON content anywhere in yourstack, from the bottom layer of a client application to a dynamiclookup in the browser. That's real flexibility.


# Is this an extension or a replacement?

Some of the comments on the list imply that the proposed changes willconflict with the existing API, but looking lightly through thedocument in confluence, it appears that you've designed this as anextension of the existing API . Which is it? Of the methods thatalready exist, which ones (if any) are you proposing to change andwhich are you planning to leave alone?

For example, will you prevent people from continuing to use the "big"methods when they want to, or will you simply add the option of usingthe new finer-grained methods?


# Relationships

It will be particularly nice to have the relationship methods added tothe REST api. Many thanks for pushing on this. We weren't able to putthem in the original version of the api in late 2007 because theunderlying Java code didn't exist yet. I've been hoping to fix itsince then.

I'm confused by some of the conversation about the method signaturesfor these methods. With all of the challenges you're running into WRTtranslating RDF into a URL, I think it makes sense to provide thefailover of allowing people the option of POSTing rdf/xml and n-triples as the body of their request. That is what POST content isfor and, as far as I know, "URL" is not a standard serialization forRDF. Remember, REST is about leveraging all of the elegant power ofHTTP, not just using nicer URLs. It makes sense to allow people touse the standard REST/HTTP approach and a standard RDF serializationto update RDF in Fedora.

Example Case for URL syntax: I have a lightweight app that doesn'tknow anything about RDF. I just want to assert that X isMemberOf Y.I use the URL syntax, with PIDs as subject and object and simplifiedversions of Fedora Relationship predicates to make this assertion.

Example Case for POST content: I already have the desiredrelationship represented as RDF in my code. I just want to push itinto Fedora. I push it to /objects/{PID}/relationships as RDF/XML.Alternatively, I can still go straight to the REL-EXT datastream andedit the RDF/XML directly.

I think both of these cases are important and I think it's possible tosupport both.


# Content Disposition

After the relationship methods, my biggest hope is for /datastreams/{dsid}/content to start using the content-disposition header (see http://fedoracommons.org/jira/browse/FCREPO-497), a feature that was originally suggested by Steve Bayliss. Thiswould put an end to datastreams being downloaded with the absurdfilename of "content". It's a little thing, but it has a huge impacton user experience.

There are a number of tickets in Jira around using HTTP more fully inthe REST api. (i.e.. http://fedoracommons.org/jira/browse/FCREPO-182and http://fedoracommons.org/jira/browse/FCREPO-412 ). Please keepthese in mind for the re-working of the api. (This might go withoutsaying. Your name comes up a few times in the tickets.)


# Properties are not Resources

Regarding exposing the properties at distinct urls, to put it simplythis breaks the REST model. Properties are not resources; they areproperties of a resource. If I go to /objects/{PID}, I should get a_representation_ of that object -- a resource -- consisting of itsproperties. I should not have to go to different resources in orderto CRUD those properties.


An example:

If you took the current design to its logical end, the DC datastreamwould have resources like /title, /creator, /publisher, etc hangingoff of its URL. While this may sound appealing to some, it doesn'tactually help anyone to make better applications and it certainlydoesn't fit the REST model. When you're reading the resource, thisinformation belongs in the base resource's HTTP content response,ideally with a couple of serialization options (XML, HTTP, JSON, RDF/XML, N-triples, etc). When you're updating it, you should be able topass the values to the resource as either url params or http content.Either way, exposing individual properties as resources simplycomplicates the task of writing a library to consume the api.





On Oct 28, 2009, at 11:33 AM, Asger Askov Blekinge wrote:

Hi Aaron



On Wed, 2009-10-28 at 16:36 +0100, Aaron Birkland wrote:

the implications of these
fine-granied operations on datastream versions is unclear.
You are quite right. I have simply not thought of the number ofversions
that would be created from doing these operations in a series.
I think that executing multiple operations in series as part of asinglelogical change that would otherwise be preserved as a unit might bean
anti-pattern in general (some of these preserved versions would
necessarily be inconsistent with the desired end state).

Yes, that would be an anti-pattern.

Each operation needs to do versioning, as if it was singular, as itcanhave no knowledge of the logical change, and thus the unit topreserve.

I have an idea for a transactional system for Fedora, that I wouldlike

to hear your opinion on.


There will be 3 new methods, like StartTransaction, CommitTransaction
and DeleteTransaction. StartTransaction gives you a token, that
identifies the transaction. All the normal API methods will take that

token as a parameter, and execute the operation within thetransaction.

Normal operation of Fedora will work as normal.

When an object is modified as part of a transaction, the normal
procedure for an API call is not followed.

1. First, the Fedora system attempt to get a write-lock on the object.
The object is being written as part of this transaction, and does not
allow other processes to edit it.
2. The object is parsed into memory, and stored as part of the
transaction.
3. The change is executed on the in-memory object, preserving
information about which new datastream is created.
4. Return

Normal reads of the object will see the unmodified object. Reads from
within the transaction will see the modified object.

Further modifications will either lock other objects, or work on oneof

the already locked objects. The interesting case is when an object is
modified twice in the same transaction.

1. In case the change involves a unversioned property (unversioned
datastream, object property), the change should just overwrite the
previous value, even if that was set as part of this transaction.

2. In case the change involves something versioned, but the unit hasnot

been modified in this transaction: Make a new version as normal, with
the change.
3. In case the change involves something versioned, and this thing has

already been modified as part of this transaction: Find the newversion

created, and replace the values in that.
4. In case the change involves something deleted in the same
transaction: This cause an error, and the change is not carried out.

The procedure above would ensure that changes to the same logical unit
would be made part of the same storage version.


When the change is committed:
First, the system upgrades all write locks on modified objects to read
locks. The locked objects are parsed into memory, and used to service
read requests while the transaction is written to storage.

The objects are written to the store, one at a time (as that is theonly

way to do so). If there is a problem with writing one of the objects,
the transaction is aborted, and all objects written are replaced with
their previously parsed counterparts.
(This is the risky step. If, and only if, the fedora system goes down
while committing a transaction, will the repository be left in an
inconsistent state.)
When all the modifications have been written, the old data objects are
cleared from memory and the locks on the objects released.

When a transaction is deleted:

Remove the transaction object, and all the modified objects. Theseexist

only in memory at that moment, so this change will be invisible to the
storage system.


Risks at different stages

1. The client goes down during a transaction, thus locking someobjects:

Timeout on the transaction remedies this problem
2. The server goes down during a transaction, before commit is called:
The transaction is stored only in memory (or similar non-persistent
storage) so all recollection of the transaction is only relevant while
the server is running. Reboot removes all transactions.

3. The server goes down during a commit: The repo is leftinconsistent.

Easiest way to mitigate: Write the old versions of the objects to some
more permanent store before changing the versions in the repository.

When the server starts up, restore any objects from this store, sothat

the finished repo is consistent again.


Gotchas in this approach:
1. The triple store will not be transactional. It will reflect the

current contents of the repo. We can, as the last part of atransaction,

ingest all the rdf statements from the changed objects into the triple

store, so that it gets them all at the same time. Still, there willbe a

tiny desynchronisation between fedora and triple store.

2. Content from remote locations (ie. URLs) will not be downloadeduntil

the transaction is committed. That is the only part that cannot be
reasonably validated, before the attempt is made.

 However, in my
own use case, I would say that 90-99% of all datastreammodifications do
involve only *one* change - and in that case, this API proposal fits
very well by providing an additional, lightweight, easy to use tool.

I am glad to hear that. For a moment, I thought that the problem you
pointed out would be the death of this API.

Some of my most common use cases are:
1) updating datastream content, keeping all properties the same
2) changing datastream state, keeping content and other propertiesthe
same
3) fixing datastream MIME type, keeping content and otherproperties the
same
A less common (but important) use case for me involving changes toboth
content and properties is updating content, label, and mime type.   I
would want to to perform that change one atomic operation.

I do not propose to change things without creating an Audit entry.What
made you think I meant that?
I was not sure if the granularity of these operations had any
implications on auditing.

I do have a problem with the Audit system at the moment, as it doesnotstore the old values. This is not good when changing properties thatare

not part of the versioned bit of a datastream.

If a client intends to modify
both datastream content as well as datastream properties, does this
imply that it MUST first change datastream content, then change
properties?

Why should the order matter?


You're right -  it does not matter.  It may have implications on what
the inconsistent intermediate versions would look like, but does not
present a fundamental difference.

This has been a delightful topic, Asger.  Thanks!

Thanks!


 -Aaron



------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA

is the only developer event you need to attend this year. Jumpstartyourdeveloping skills, take BlackBerry mobile applications to market andstay

ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference

_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Re: [Fedora-commons-developers] Fedora REST interface, a proposal for methods

Reply via email to