Re: [Sword-TAP] My Thoughts

Richard Jones Wed, 16 Mar 2011 15:53:18 -0700

Hi Dave,

> app:accept vs sword:acceptPackaging
>
>
> * The SWORD server MUST specify the app:accept element. If the Collection can 
> take any format content type, it should specify */* as its value [AtomPub]. 
> It MUST also specify an app:accept element with an alternate attribute set to 
> multipart-related as required by [AtomMultipart]
> * The SWORD server MAY include zero or more sword:acceptPackaging elements 
> [SWORD002]. The value SHOULD be a URI for a known packaging format (where 
> such URIs exist)
>
> So there is an implied behaviour here? The second statement says I can do 
> something with this package, where the first says I accept this mime-type. 
> This is highly confusing in my view and you should either stick to one or the 
> other. What if the server only accepts PDF but in the packaging says it 
> accepts bagit? Should it say in the accept that it accepts zip? What if the 
> package and the accept are the same thing, e.g. a docx (which is also a 
> zip...?)


First of all, this is the behaviour from SWORD 1.3, and I'm not aware of 
any particular confusion regarding it.

Secondly, if a server can't announce consistent app:accept and 
sword:acceptPackaging elements, that's not really an issue with the 
profile but with the server implementation.  The same issues arise in 
content negotiation, as we have covered extensively on this list.

In the case of docx, since this has a mime-type, I don't see any problem 
with supplying this as an app:accept.  If you made up a URI for it you 
could put it in the sword:acceptPackaging element too.

> I can see people becoming very confused here!

And yet so far, since this has been in place since 2008, no such 
confusion has been indicated to me (perhaps I missed it?).

This is really no more confusing than the existing content negotiation 
headers in HTTP.  The answer - in the absence of a hard solution to that 
problem - is a bit of common sense on both sides.

> If we need both then they either both need to be mime-types or both need to 
> be URIs, or accept both.
> I feel this is a bodged solution to (1) keep to atom pub, (2) Extend 
> mime-types to URIs. Maybe what
> we are actually after is the server to accept */* then the client to use a 
> dc:Conforms to somehow
> to inform the server on the packaging they are sending (thus not having to 
> mint another header).
> In the deposit receipt the server could then inform the client on the 
> capabilities it has to do
> stuff with this format/package. This provides a much greater level of 
> flexibility moving forward,
> but are we ready to include this yet.

Unfortunately, we can't have mime-types for both, as there are simply 
not mime-types for the package formats (unless you happen to know of a 
mime-type which uniquely identifies a DSpace METS package?).  Meanwhile 
app:accept only takes mime-types, not URIs.

This is therefore a solution which a) avoids confusing pure AtomPub 
implementations and b) allows the server to accurately identify the 
packaging formats it supports.

The client cannot supply a dc:conforms to the server unless it also 
sends an Atom Entry with it, which is not a requirement, so that 
approach won't work.  This is the reason we have for pushing all of that 
information up into the HTTP headers.

The last part of your point above I can't make sense of.  In the deposit 
receipt (i.e. after deposit) the server tells the client what it can do 
with the package.  Doesn't that seem somewhat after-the-fact?

> To retrieve the content in the desired packaging format, the client makes an 
> HTTP GET
> request to the EM-URI and MAY supply an Accept-Packaging header [SWORD001] 
> with the URI
> from one of the sword:packaging elements.
>
> Same here, this is disgusting when we have the HTTP accept header which 
> should be the
> priority. I can see the point in using the accept-packaging header in order 
> to get
> round the mime-types vs URIs issue, however it would be better if the whole 
> lot was
> done using mime-types!

Then in which case if you could supply us with the mime-types for every 
conceivable package format that would be great.

Otherwise you may try to see this as a pragmatic solution to an actual 
existing problem.  SWORD is, of course, open to alternatives which meet 
the requirements.

> 6.4. Editing the Content of a Resource
>
> The client MAY provide an In-Progress header with a value of true or false 
> [SWORD001]
>
> Not regarding the content it shouldn't, surely the whole item (defined by the 
> edit-URI)
> is In-Progress or not, not the content. E.G. can my blog post (which 
> includes, titie,
> tags, body text etc) be complete but the content still in progress. No in my 
> view, and
> this would be a bitch for the server to implement!
>
> In-Progress should only be used on the container (Edit) URI.

I can see what you're saying, the spec is not sufficiently clear on this 
point.

The intention is that the In-Progress header refers to the container, 
not the content, but when content is modified the server needs to know 
whether the addition of that content means that the object itself is 
"finished".

So, my use case is this:

I upload a package with some files in it to the Col-URI, and I specify 
In-Progress because tomorrow I plan to upload some more files.  The 
following day I realise that I uploaded the wrong thing, and so I PUT 
the correct package to the EM-URI, and I supply the In-Progress header 
because I still plan to add more content to the object.

If I can't supply the In-Progress header to the content PUT, what 
happens?  Does the server assume that I am still "In progress", or does 
it assume that I'm finished?  If it assumes that I'm still in progress, 
does it then wait until I send an "In-Progress: false" to the Edit-URI?

We could explore a variety of options along these lines, the key being 
for it to be quick and easy for the client/server to agree when an item 
is "finished".

> 6.5. Deleting the Content of a Resource
>
> Should return the 200 header representing what has happened.
>
> I'm against the delete operation on the content URI returning the receipt of 
> the edit-URI.
> No one asked for this, the client asked for the thing to be deleted, not a 
> statement or
> receipt! I've made this point before... it got ignored.

Oh for god's sake Dave, grow up.  It did not get ignored because you and 
I had a long discussion about it in person.  I argued that since all the 
other operations can optionally supply a response to a DELETE request, 
it was inconsistent to not be able to supply one in this one instance. 
Additionally, the deposit receipt is strictly optional in ALL cases as 
per the specification.  You argued that DELETE doesn't allow a response 
(and should return 204), and this turned out to be untrue when I pointed 
you to the HTTP spec.  I'd appreciate it if you'd actually remember the 
discussions we've had about this before having a strop.

> The client knows the edit-URI as as you say in your video Richard, if you do 
> a get on the
> Edit-URI you can get the receipt again.  This is too overblown here.

It's strictly optional as per the document.  If you don't want to return 
a receipt in your implementation, then don't.

> 1) Client asks to delete object at any URI
> 2) Server returns http error code
>
> If the server chooses to return anything else then the Content-Location 
> header MUST be
> used to define what it is returning. This is because the client SHOULDN'T be 
> able to call
> the same delete operation to get the same content as the object should 404 or 
> 410 at
> that point.

I will add the requirements on the Content-Location header to the spec 
if that's the appropriate thing to do.  I'm still a little unsure about 
how Location and Content-Location should be used ... have to go do some 
more reading of the HTTP spec.

> 6.6. Adding Content to a Resource
>
> Yuck! This section needs completely re-writing and/or removing. There is no 
> detail
> here on what the server should do with the random content it is given and 
> where it should go.
> If it is posted to the Edit-URI, is the a new EM-URI, replace everything at 
> the EM-URI or
> what. I think if you want to add content (Media) into the container then you 
> post it to the
> Edit-Media URI.

POST to the EM-URI implies something about the structure of the Media on 
the server.  POST to the Edit-URI is the appropriate RESTful way to add 
new content to a container.

There is no detail on what the server should do because SWORD is an 
interface definition not a set of implementation decisions.  How the 
server implements a POST of new content to the container is up to the 
precise mechanisms of the application ...

> I really don't like this and think we should more closely consider GDocs API 
> here for
> how to handle containers and the limitations.

We have tried to ensure that the GData spec is not ruled out by SWORD, 
but it brings with it its own set of idioms with regard to hierarchical 
file systems which may be appropriate for EPrints, but is not 
necessarily appropriate for all other scholarly information systems.  As 
such, I'm reticent to propose it as /part/ of SWORD but I do want to 
ensure that you can implement it /as well as/ SWORD.

I spent some time working through the GData 3.0 spec and was unable to 
find any information about exactly how they think you should get hold of 
the feed representation of an entry.  I have assumed content negotiation 
on the Edit-URI (see Section 6.8), but if you could confirm for me how 
Google recommend this is done that would be useful.

> This whole fudge it and see it not going to help in the future.

this doesn't appear to be a sentence.  can you reword/expand?

> Interestingly section 6.6.3 DOES use the EM-URI not the Edit-URI...? Any 
> reason?

Typo, by the looks of it, I will fix that.

> Other than that, this looks pretty good. I still believe it can be made much 
> better
> through better alignment with the GDocs API, this will also make it simpler 
> for both
> the server and client while enabling more functionality. I am going to change 
> a
> version of this spec to reflect this and then people can see the main 
> differences
> and capabilities. This also may rid the need for the statement URI and make it
> simpler again.

I strongly feel that you are too focussed on the EPrints implementation 
decisions where the GData API may be appropriate.  We have try, with 
SWORD, to avoid getting bogged down in the details of the structure of 
information at either end of the protocol.  We are purely concerned with 
deposit, not with content management, as we discussed early on.

As detailed in the spec, the statement URI can indeed be the same as the 
URI which retrieves the feed representing the entry as employed by 
google docs.

Cheers,

Richard



------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Sword-app-techadvisorypanel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sword-app-techadvisorypanel

Re: [Sword-TAP] My Thoughts

Reply via email to