Re: [Fedora-commons-developers] Fedora REST interface, a proposal for methods

Asger Askov Blekinge Fri, 30 Oct 2009 05:28:45 -0700

On Thu, 2009-10-29 at 18:26 +0100, Matt Zumwalt wrote:
> Asger, Thanks for taking this on, and thanks for writing up this
> proposal.  It's really important to keep refining this API and it
> helps to have a document to work from.  Here are a few thoughts based
> on the proposal and the comments so far.
> 
> 
> I propose a fundamental design guideline for this API: Fedora should
> be responsible for exposing complete CRUD endpoints for its
> resources.  The REST API should use HTTP structures whenever
> possible.  Keep it clean and predictable.  Leave the rest to the
> client libraries.
> 
> 
> # Serializations, rather than changing URLs, are the key
> 
> 
> If you really want to make it easier for lightweight apps to work with
> Fedora object properties, support JSON for the full CRUD cycle.  Heck,
> support 10 different serializations (XML, RDF/XML, JSON, n-triples,
> Ruby, Python ...).  That would be awesome.  It follows the existing
> pattern of encouraging 3rd-party developers by supporting
> serializations in their native languages.
I changed the URLs to better accommedate the conceptual view of a fedora
object. Datastream relations might be stored in a specific datastreams,
but they are field on another datastream. The foxml serialization should
not dictate the URLs. But, as you can see, most the URLs did in fact not
change, as they adhered to this already.


> 
> 
> If you have to choose one serialization to start with, it's JSON.
> Hands down.  Any language can parse it, including Javascript (of
> course), meaning that you can inject JSON content anywhere in your
> stack, from the bottom layer of a client application to a dynamic
> lookup in the browser.  That's real flexibility.
You are right there, and thanks for that input. I will go through the
document to update and add in output information.
The finegrained methods were to take away the need to parse the output,
but that is of course never entirely possible. When a method returns
just one value, no wrapping is nessesary, but as soon as you return two
or more, it needs to be encoded in a list structure in some language. As
such, I tried to prevent that anything returned lists, but there are
places where that is inpreventable. For these places, serialization is
nessesary, and JSON is a very good idea for the default.


> 
> 
> # Is this an extension or a replacement?
> 
> 
> Some of the comments on the list imply that the proposed changes will
> conflict with the existing API, but looking lightly through the
> document in confluence, it appears that you've designed this as an
> extension of the existing API .  Which is it?  Of the methods that
> already exist, which ones (if any) are you proposing to change and
> which are you planning to leave alone?  
> 
> 
> For example, will you prevent people from continuing to use the "big"
> methods when they want to, or will you simply add the option of using
> the new finer-grained methods?
> 
I understand the confusion. Initially, I wanted this to be a
replacement, but I will never get it finished in time for Fedora 3.3 And
in Fedora 3.3 the REST api will be finalized. So, it will be an
extension, instead. 

I should not live in the /objects/ prefix but something
like /crud/objects.

The old REST methods should definitely stay there. As Aaron pointed out,
a modifyDatastream method that can modify both properties and contents
in one invocation is nessesary. 

At some later stage there might be a merger of the two APIs.

> 
> # Relationships
> 
> 
> It will be particularly nice to have the relationship methods added to
> the REST api.  Many thanks for pushing on this. We weren't able to put
> them in the original version of the api in late 2007 because the
> underlying Java code didn't exist yet.  I've been hoping to fix it
> since then.
> 
> 
> I'm confused by some of the conversation about the method signatures
> for these methods.  With all of the challenges you're running into WRT
> translating RDF into a URL, I think it makes sense to provide the
> failover of allowing people the option of POSTing rdf/xml and
> n-triples as the body of their request.  That is what POST content is
> for and, as far as I know, "URL" is not a standard serialization for
> RDF.  Remember, REST is about leveraging all of the elegant power of
> HTTP, not just using nicer URLs.  It makes sense to allow people to
> use the standard REST/HTTP approach and a standard RDF serialization
> to update RDF in Fedora.
>   
> 
> 
> Example Case for URL syntax: I have a lightweight app that doesn't
> know anything about RDF.  I just want to assert that X isMemberOf Y.
> I use the URL syntax, with PIDs as subject and object and simplified
> versions of Fedora Relationship predicates to make this assertion.
> 
> 
> Example Case for POST content:  I already have the desired
> relationship represented as RDF in my code.  I just want to push it
> into Fedora.  I push it to /objects/{PID}/relationships as RDF/XML.
>  Alternatively, I can still go straight to the REL-EXT datastream and
> edit the RDF/XML directly.
> 
> 
> I think both of these cases are important and I think it's possible to
> support both.
I get your point, and I was worried about this. Fully named relations
does not nice URLs make. 

I have chosen to think of the relationship as a resource, and thus
something that can be addressed in REST, but the POSTing idea of RDF
statements did not occur to me. Thanks for that.

Allowing people to post RDF directly sounds like a good idea, but it has
implications. To delete relations, you would then
DELETE /objects/{pid}/relations with the RDF content to remove?

But this is not so bad, through. I too feel that both ways should be
supported.



> 
> 
> # Content Disposition
> 
> 
> After the relationship methods, my biggest hope is
> for /datastreams/{dsid}/content to start using the content-disposition
> header (see http://fedoracommons.org/jira/browse/FCREPO-497), a
> feature that was originally suggested by Steve Bayliss.  This would
> put an end to datastreams being downloaded with the absurd filename of
> "content".  It's a little thing, but it has a huge impact on user
> experience.
That one I hadn't noticed. And yes, it should really be done.

> 
> 
> There are a number of tickets in Jira around using HTTP more fully in
> the REST api.  (i.e.. http://fedoracommons.org/jira/browse/FCREPO-182
> and http://fedoracommons.org/jira/browse/FCREPO-412 ). Please keep
> these in mind for the re-working of the api. (This might go without
> saying.  Your name comes up a few times in the tickets.)

I will have to make a list of them. There are so many.


> 
> 
> # Properties are not Resources
> 
> 
> Regarding exposing the properties at distinct urls, to put it simply
> this breaks the REST model.  Properties are not resources; they are
> properties of a resource.  If I go to /objects/{PID}, I should get a
> _representation_ of that object -- a resource -- consisting of its
> properties.  I should not have to go to different resources in order
> to CRUD those properties.
> 
> 
> An example: 
> 
> 
> If you took the current design to its logical end, the DC datastream
> would have resources like /title, /creator, /publisher, etc hanging
> off of its URL.  While this may sound appealing to some, it doesn't
> actually help anyone to make better applications and it certainly
> doesn't fit the REST model.  When you're reading the resource, this
> information belongs in the base resource's HTTP content response,
> ideally with a couple of serialization options (XML, HTTP, JSON,
> RDF/XML, N-triples, etc).  When you're updating it, you should be able
> to pass the values to the resource as either url params or http
> content.  Either way, exposing individual properties as resources
> simply complicates the task of writing a library to consume the api.

Here I disagree with you. Taking the alternative design to it's logical
end, datastreams would not be resources. They are properties of the
object, and should be modifiable directly by posting to the object. The
only resource would be the object.

I hope we can agree that the optimal situation is somewhere between
those to extremes. 

I might not have shown this clearly enough in the API, but the property
list is finite. There is a distinct set of allowed properties, and they
always exist and cannot be repeated.

I fail to see how POST /objects/{pid}/properties/state (content A)is
more difficult to consume than POST /objects/{pid}?State=A

The exact design criteria for my alternative API was to make the
properties and like directly addressable, so that you did not have to
encode the content in some format. I wanted to make an API with a
minimum of serialization.

Thanks for your input, it has been much appreciated.

Regards
Asger


> 
> 
> 
> 
> 
> 
> 
> 
> On Oct 28, 2009, at 11:33 AM, Asger Askov Blekinge wrote:
> 
> > Hi Aaron
> > 
> > 
> > 
> > On Wed, 2009-10-28 at 16:36 +0100, Aaron Birkland wrote:
> > > > > the implications of these
> > > > > fine-granied operations on datastream versions is unclear.
> > > > 
> > > > You are quite right. I have simply not thought of the number of
> > > > versions
> > > > that would be created from doing these operations in a series.
> > > 
> > > I think that executing multiple operations in series as part of a
> > > single
> > > logical change that would otherwise be preserved as a unit might
> > > be an
> > > anti-pattern in general (some of these preserved versions would
> > > necessarily be inconsistent with the desired end state).
> > Yes, that would be an anti-pattern. 
> > 
> > Each operation needs to do versioning, as if it was singular, as it
> > can
> > have no knowledge of the logical change, and thus the unit to
> > preserve.
> > 
> > I have an idea for a transactional system for Fedora, that I would
> > like
> > to hear your opinion on.
> > 
> > 
> > There will be 3 new methods, like StartTransaction,
> > CommitTransaction
> > and DeleteTransaction. StartTransaction gives you a token, that
> > identifies the transaction. All the normal API methods will take
> > that
> > token as a parameter, and execute the operation within the
> > transaction.
> > Normal operation of Fedora will work as normal.
> > 
> > When an object is modified as part of a transaction, the normal
> > procedure for an API call is not followed.
> > 
> > 1. First, the Fedora system attempt to get a write-lock on the
> > object.
> > The object is being written as part of this transaction, and does
> > not
> > allow other processes to edit it.
> > 2. The object is parsed into memory, and stored as part of the
> > transaction.
> > 3. The change is executed on the in-memory object, preserving
> > information about which new datastream is created.
> > 4. Return
> > 
> > Normal reads of the object will see the unmodified object. Reads
> > from
> > within the transaction will see the modified object.
> > 
> > Further modifications will either lock other objects, or work on one
> > of
> > the already locked objects. The interesting case is when an object
> > is
> > modified twice in the same transaction.
> > 
> > 1. In case the change involves a unversioned property (unversioned
> > datastream, object property), the change should just overwrite the
> > previous value, even if that was set as part of this transaction.
> > 2. In case the change involves something versioned, but the unit has
> > not
> > been modified in this transaction: Make a new version as normal,
> > with
> > the change.
> > 3. In case the change involves something versioned, and this thing
> > has
> > already been modified as part of this transaction: Find the new
> > version
> > created, and replace the values in that. 
> > 4. In case the change involves something deleted in the same
> > transaction: This cause an error, and the change is not carried out.
> > 
> > The procedure above would ensure that changes to the same logical
> > unit
> > would be made part of the same storage version.
> > 
> > 
> > When the change is committed:
> > First, the system upgrades all write locks on modified objects to
> > read
> > locks. The locked objects are parsed into memory, and used to
> > service
> > read requests while the transaction is written to storage. 
> > The objects are written to the store, one at a time (as that is the
> > only
> > way to do so). If there is a problem with writing one of the
> > objects,
> > the transaction is aborted, and all objects written are replaced
> > with
> > their previously parsed counterparts.
> > (This is the risky step. If, and only if, the fedora system goes
> > down
> > while committing a transaction, will the repository be left in an
> > inconsistent state.)
> > When all the modifications have been written, the old data objects
> > are
> > cleared from memory and the locks on the objects released.
> > 
> > When a transaction is deleted:
> > Remove the transaction object, and all the modified objects. These
> > exist
> > only in memory at that moment, so this change will be invisible to
> > the
> > storage system.
> > 
> > 
> > Risks at different stages
> > 1. The client goes down during a transaction, thus locking some
> > objects:
> > Timeout on the transaction remedies this problem
> > 2. The server goes down during a transaction, before commit is
> > called:
> > The transaction is stored only in memory (or similar non-persistent
> > storage) so all recollection of the transaction is only relevant
> > while
> > the server is running. Reboot removes all transactions.
> > 3. The server goes down during a commit: The repo is left
> > inconsistent.
> > Easiest way to mitigate: Write the old versions of the objects to
> > some
> > more permanent store before changing the versions in the repository.
> > When the server starts up, restore any objects from this store, so
> > that
> > the finished repo is consistent again.
> > 
> > 
> > Gotchas in this approach:
> > 1. The triple store will not be transactional. It will reflect the
> > current contents of the repo. We can, as the last part of a
> > transaction,
> > ingest all the rdf statements from the changed objects into the
> > triple
> > store, so that it gets them all at the same time. Still, there will
> > be a
> > tiny desynchronisation between fedora and triple store.
> > 2. Content from remote locations (ie. URLs) will not be downloaded
> > until
> > the transaction is committed. That is the only part that cannot be
> > reasonably validated, before the attempt is made.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > >  However, in my
> > > own use case, I would say that 90-99% of all datastream
> > > modifications do
> > > involve only *one* change - and in that case, this API proposal
> > > fits
> > > very well by providing an additional, lightweight, easy to use
> > > tool.
> > I am glad to hear that. For a moment, I thought that the problem you
> > pointed out would be the death of this API.
> > 
> > 
> > 
> > 
> > > 
> > > Some of my most common use cases are:
> > > 1) updating datastream content, keeping all properties the same
> > > 2) changing datastream state, keeping content and other properties
> > > the
> > > same
> > > 3) fixing datastream MIME type, keeping content and other
> > > properties the
> > > same
> > > 
> > > A less common (but important) use case for me involving changes to
> > > both
> > > content and properties is updating content, label, and mime type.
> > >   I
> > > would want to to perform that change one atomic operation.
> > 
> > > 
> > > > I do not propose to change things without creating an Audit
> > > > entry. What
> > > > made you think I meant that?
> > > 
> > > I was not sure if the granularity of these operations had any
> > > implications on auditing.  
> > I do have a problem with the Audit system at the moment, as it does
> > not
> > store the old values. This is not good when changing properties that
> > are
> > not part of the versioned bit of a datastream. 
> > 
> > 
> > 
> > 
> > > 
> > > > > If a client intends to modify
> > > > > both datastream content as well as datastream properties, does
> > > > > this
> > > > > imply that it MUST first change datastream content, then
> > > > > change
> > > > > properties?  
> > > > Why should the order matter?  
> > > 
> > > You're right -  it does not matter.  It may have implications on
> > > what
> > > the inconsistent intermediate versions would look like, but does
> > > not
> > > present a fundamental difference.
> > > 
> > > This has been a delightful topic, Asger.  Thanks!
> > Thanks!
> > 
> > > 
> > >  -Aaron
> > > 
> > > 
> > > 
> > 
> > 
> > ------------------------------------------------------------------------------
> > Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> > is the only developer event you need to attend this year. Jumpstart
> > your
> > developing skills, take BlackBerry mobile applications to market and
> > stay 
> > ahead of the curve. Join us from November 9 - 12, 2009. Register
> > now!
> > http://p.sf.net/sfu/devconference
> > _______________________________________________
> > Fedora-commons-developers mailing list
> > Fedora-commons-developers@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> > 
> 


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Re: [Fedora-commons-developers] Fedora REST interface, a proposal for methods

Reply via email to