Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Bill Dueber
On Sun, Dec 1, 2013 at 7:57 PM, Barnes, Hugh hugh.bar...@lincoln.ac.nzwrote:

 +1 to all of Richard's points here. Making something easier for you to
 develop is no justification for making it harder to consume or deviating
 from well supported standards.



​I just want to point out that as much as we all really, *really* want
easy to consume and following the standards to be the same
thingthey're not. Correct content negotiation is one of those things
that often follows the phrase all they have to do..., which is always a
red flag, as in  Why give the user  different URLs ​when *all they have to
do is* Caching, json vs javascript vs jsonp, etc. all make this
harder. If *all * *I have to do* is know that all the consumers of my data
are going to do content negotiation right, and then I need to get deep into
the guts of my caching mechanism, then set up an environment where it's all
easy to test...well, it's harder.

And don't tell me how lazy I am until you invent a day with a lot more
hours. I'm sick of people telling me I'm lazy because I'm not pure. I
expose APIs (which have their own share of problems, of course) because I
want them to be *useful* and *used. *

  -Bill, apparently feeling a little bitter this morning -




-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Robert Sanderson
Hi Richard,

On Sun, Dec 1, 2013 at 4:25 PM, Richard Wallis 
richard.wal...@dataliberate.com wrote:

 It's harder to implement Content Negotiation than your own API, because
 you
 get to define your own API whereas you have to follow someone else's rules
 Don't wish your implementation problems on the consumers of your data.
 There are [you would hope] far more of them than of you ;-)

Content-negotiation is an already established mechanism - why invent a
 new, and different, one just for *your* data?


I should have been clearer here that I was responding to the original blog
post.  I'm not advocating arbitrary APIs, but instead just to use link
headers between the different representations.

The advantages are that the caching issues (both browser and intermediate
caches) go away as the content is static, you don't need to invent a way to
find out which formats are available (eg no arbitrary content in a 300
response), and you can simply publish the representations as any other
resource without server side logic to deal with conneg.

The disadvantages are ... none.  There's no invention of APIs, it's just
following a simpler route within the HTTP spec.


Put your self in the place of your consumer having to get their head
 around yet another site specific API pattern.


As a consumer of my own data, I would rather do a simple GET on a URI than
mess around constructing the correct Accept header.



 As to discovering then using the (currently implemented) URI returned from
 a content-negotiated call  - The standard http libraries take care of that,
 like any other http redirects (301,303, etc) plus you are protected from
 any future backend server implementation changes.


No they don't, as there's no way to know which representations are
available via conneg, and hence no automated way to construct the Accept
header.

Rob


Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Robert Sanderson
On Sun, Dec 1, 2013 at 5:57 PM, Barnes, Hugh hugh.bar...@lincoln.ac.nzwrote:

 +1 to all of Richard's points here. Making something easier for you to
 develop is no justification for making it harder to consume or deviating
 from well supported standards.


I'm not suggesting deviating from well supported standards, I'm suggesting
choosing a different approach within the well supported standard that makes
it easier for both consumer and producer.



 [Robert]
   You can't
  just put a file in the file system, unlike with separate URIs for
  distinct representations where it just works, instead you need server
  side processing.

 If we introduce languages into the negotiation, this won't scale.


Sure, there's situations where the number of variants is so large that
including them all would be a nuisance.  The number of times this actually
happens is (in my experience at least) vanishingly small.  Again, I'm not
suggesting an arbitrary API, I'm saying that there's easier ways to
accomplish the 99% of cases than conneg.



 [Robert]
  This also makes it much harder to cache the
  responses, as the cache needs to determine whether or not the
  representation has changed -- the cache also needs to parse the
  headers rather than just comparing URI and content.

 Don't know caches intimately, but I don't see why that's algorithmically
 difficult. Just look at the Content-type of the response. Is it harder for
 caches to examine headers than content or URI? (That's an earnest, perhaps
 naïve, question.)

 If we are talking about caching on the client here (not caching proxies),
 I would think in most cases requests are issued with the same Accept-*
 headers, so caching will work as expected anyway.


I think Joe already discussed this one, but there's an outstanding conneg
caching bug in firefox and it took even Squid a long time to implement the
content negotiation aware caching.  Also note, much harder not
impossible :)

No Conneg:
* Check if we have the URI. Done. O(1) as it's a hash.

Conneg:
* Check if we have the URI. Parse the Accept headers from the request.
 Check if they match the cached content and don't contain wildcards.
 O(quite a lot more than 1)



 [Robert]
  Link headers
  can be added with a simple apache configuration rule, and as they're
  static are easy to cache. So the server side is easy, and the client
 side is trivial.

 Hadn't heard of these. (They are on Wikipedia so they must be real.) What
 do they offer over HTML link elements populated from the Dublin Core
 Element Set?


Nothing :) They're link elements in a header so you can use them in non
HTML representations.


My whatever it's worth . great topic, though, thanks Robert :)


Welcome :)

Rob


Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Simeon Warner

On 12/2/13 10:50 AM, Robert Sanderson wrote:

On Sun, Dec 1, 2013 at 4:25 PM, Richard Wallis 
richard.wal...@dataliberate.com wrote:

As to discovering then using the (currently implemented) URI returned from
a content-negotiated call  - The standard http libraries take care of that,
like any other http redirects (301,303, etc) plus you are protected from
any future backend server implementation changes.


No they don't, as there's no way to know which representations are
available via conneg, and hence no automated way to construct the Accept
header.


To me this is the biggest issue with content negotiation for machine 
APIs. What you get may be influenced by the Accept headers you send, but 
without detailed knowledge of the particular system you are interacting 
with you can't predict what you'll actually get.


Cheers,
Simeon


Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Jonathan Rochkind
Yeah, I'm going to disagree a bit with the original post in this thread, 
and with Richard's contribution too. Or at least qualify it.


My experience is that folks trying to be pure and avoid an API do _not_ 
make it easier for me to consume as a developer writing clients. It's 
just not true that one always leads to the other.


The easiest API's I have to deal with are those where the developers 
really understand the use cases clients are likely to have, and really 
make API's that conveniently serve those use cases.


The most difficult API's I have to deal with are those where the 
developers spent a lot of time thinking about very abstract and 
theoretical concerns of architectural purity, whether in terms of REST, 
linked data, HATEOS, or, god forbid, all of those and more at once (and 
then realizing that sometimes they seem to conflict) -- and neglected to 
think about actual use cases and making them smooth.


Seriously, think about the most pleasant, efficient, and powerful API's 
you have used.  (github's?  Something else?).  How many of them are 
'pure' non-API API's, how many of them are actually API's?


I'm going to call it an API even if it does what the original post 
says, I'm going to say API in the sense of how software is meant to 
deal with this -- in the base case, the so-called API can be screen 
scrape HTML, okay.


I am going to agree that aligning the API with the user-visible web app 
as much as possible -- what the original post is saying you should 
always and only do -- does make sense.  But slavish devotion to avoiding 
any API as distinct from the human web UI at all leads to theoretically 
pure but difficult to use API's.


Sometimes the 'information architecture' that makes sense for humans 
differs from what makes sense for machine access. Sometimes the human UI 
needs lots of JS which complicates things.  Even without this, an API 
which lets me choose representations based on different URI's instead of 
_only_ conneg (say, /widget/18.json instead of only /widget/18 with 
conneg) ends up being significantly easier to develop against and debug.


Spend a bit of time understanding what people consider theoretically 
pure, sure, because it can give you more tools in your toolbox.  But 
simply slavishly sticking to it does not, in my experience, result in a 
good 'developer experience' for your developer clients.  And when you 
start realizing that different people from different schools have 
different ideas of what 'theoretically pure' looks like, when you start 
spending many hours going over HTTP RANGE 14 and just getting more 
confused -- realize that what matters in the end is being easy to use 
for your developers use cases, and just do it.


Personally, I'd spend more time making sure i understand my developers
use cases and getting feedback from developers, and less time on 
architecting castles in the sky that are theoretically pure.


On 12/2/13 9:56 AM, Bill Dueber wrote:

On Sun, Dec 1, 2013 at 7:57 PM, Barnes, Hugh hugh.bar...@lincoln.ac.nzwrote:


+1 to all of Richard's points here. Making something easier for you to
develop is no justification for making it harder to consume or deviating
from well supported standards.




​I just want to point out that as much as we all really, *really* want
easy to consume and following the standards to be the same
thingthey're not. Correct content negotiation is one of those things
that often follows the phrase all they have to do..., which is always a
red flag, as in  Why give the user  different URLs ​when *all they have to
do is* Caching, json vs javascript vs jsonp, etc. all make this
harder. If *all * *I have to do* is know that all the consumers of my data
are going to do content negotiation right, and then I need to get deep into
the guts of my caching mechanism, then set up an environment where it's all
easy to test...well, it's harder.

And don't tell me how lazy I am until you invent a day with a lot more
hours. I'm sick of people telling me I'm lazy because I'm not pure. I
expose APIs (which have their own share of problems, of course) because I
want them to be *useful* and *used. *

   -Bill, apparently feeling a little bitter this morning -






Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Kevin Ford
Though I have some quibbles with Seth's post, I think it's worth 
drawing attention to his repeatedly calling out API keys as a very 
significant barrier to use, or at least entry.  Most of the posts here 
have given little attention to the issue API keys present.  I can say 
that I have quite often looked elsewhere or simply stopped pursuing my 
idea the moment I discovered an API key was mandatory.


As for the presumed difficulty with implementing content negotiation 
(and, especially, caching on top), it seems that if you can implement an 
entire system to manage assignment of and access by API key, then I do 
not understand how content negotiation and caching are significantly 
harder to implement.


In any event, APIs and content negotiation are not mutually exclusive. 
One should be able to use the HTTP URI to access multiple 
representations of the resource without recourse to a custom API.


Yours,
Kevin




On 11/29/2013 02:44 PM, Robert Sanderson wrote:

(posted in the comments on the blog and reposted here for further
discussion, if interest)


While I couldn't agree more with the post's starting point -- URIs identify
(concepts) and use HTTP as your API -- I couldn't disagree more with the
use content negotiation conclusion.

I'm with Dan Cohen in his comment regarding using different URIs for
different representations for several reasons below.

It's harder to implement Content Negotiation than your own API, because you
get to define your own API whereas you have to follow someone else's rules
when you implement conneg.  You can't get your own API wrong.  I agree with
Ruben that HTTP is better than rolling your own proprietary API, we
disagree that conneg is the correct solution.  The choice is between conneg
or regular HTTP, not conneg or a proprietary API.

Secondly, you need to look at the HTTP headers and parse quite a complex
structure to determine what is being requested.  You can't just put a file
in the file system, unlike with separate URIs for distinct representations
where it just works, instead you need server side processing.  This also
makes it much harder to cache the responses, as the cache needs to
determine whether or not the representation has changed -- the cache also
needs to parse the headers rather than just comparing URI and content.  For
large scale systems like DPLA and Europeana, caching is essential for
quality of service.

How do you find our which formats are supported by conneg? By reading the
documentation. Which could just say add .json on the end. The Vary header
tells you that negotiation in the format dimension is possible, just not
what to do to actually get anything back. There isn't a way to find this
out from HTTP automatically,so now you need to read both the site's docs
AND the HTTP docs.  APIs can, on the other hand, do this.  Consider
OAI-PMH's ListMetadataFormats and SRU's Explain response.

Instead you can have a separate URI for each representation and link them
with Link headers, or just a simple rule like add '.json' on the end. No
need for complicated content negotiation at all.  Link headers can be added
with a simple apache configuration rule, and as they're static are easy to
cache. So the server side is easy, and the client side is trivial.
  Compared to being difficult at both ends with content negotiation.

It can be useful to make statements about the different representations,
and especially if you need to annotate the structure or content.  Or share
it -- you can't email someone a link that includes the right Accept headers
to send -- as in the post, you need to send them a command line like curl
with -H.

An experiment for fans of content negotiation: Have both .json and 302
style conneg from your original URI to that .json file. Advertise both. See
how many people do the conneg. If it's non-zero, I'll be extremely
surprised.

And a challenge: Even with libraries there's still complexity to figuring
out how and what to serve. Find me sites that correctly implement * based
fallbacks. Or even process q values. I'll bet I can find 10 that do content
negotiation wrong, for every 1 that does it correctly.  I'll start:
dx.doi.org touts its content negotiation for metadata, yet doesn't
implement q values or *s. You have to go to the documentation to figure out
what Accept headers it will do string equality tests against.

Rob



On Fri, Nov 29, 2013 at 6:24 AM, Seth van Hooland svhoo...@ulb.ac.be
wrote:


Dear all,

I guess some of you will be interested in the blogpost of my colleague

and co-author Ruben regarding the misunderstandings on the use and abuse of
APIs in a digital libraries context, including a description of both good
and bad practices from Europeana, DPLA and the Cooper Hewitt museum:


http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/

Kind regards,

Seth van Hooland
Président du Master en Sciences et Technologies de l'Information et de la

Communication (MaSTIC)

Université Libre de Bruxelles
Av. F.D. 

Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Ross Singer
I'm not going to defend API keys, but not all APIs are open or free.  You
need to have *some* way to track usage.

There may be alternative ways to implement that, but you can't just hand
wave away the rather large use case for API keys.

-Ross.


On Mon, Dec 2, 2013 at 12:15 PM, Kevin Ford k...@3windmills.com wrote:

 Though I have some quibbles with Seth's post, I think it's worth drawing
 attention to his repeatedly calling out API keys as a very significant
 barrier to use, or at least entry.  Most of the posts here have given
 little attention to the issue API keys present.  I can say that I have
 quite often looked elsewhere or simply stopped pursuing my idea the moment
 I discovered an API key was mandatory.

 As for the presumed difficulty with implementing content negotiation (and,
 especially, caching on top), it seems that if you can implement an entire
 system to manage assignment of and access by API key, then I do not
 understand how content negotiation and caching are significantly harder to
 implement.

 In any event, APIs and content negotiation are not mutually exclusive. One
 should be able to use the HTTP URI to access multiple representations of
 the resource without recourse to a custom API.

 Yours,
 Kevin





 On 11/29/2013 02:44 PM, Robert Sanderson wrote:

 (posted in the comments on the blog and reposted here for further
 discussion, if interest)


 While I couldn't agree more with the post's starting point -- URIs
 identify
 (concepts) and use HTTP as your API -- I couldn't disagree more with the
 use content negotiation conclusion.

 I'm with Dan Cohen in his comment regarding using different URIs for
 different representations for several reasons below.

 It's harder to implement Content Negotiation than your own API, because
 you
 get to define your own API whereas you have to follow someone else's rules
 when you implement conneg.  You can't get your own API wrong.  I agree
 with
 Ruben that HTTP is better than rolling your own proprietary API, we
 disagree that conneg is the correct solution.  The choice is between
 conneg
 or regular HTTP, not conneg or a proprietary API.

 Secondly, you need to look at the HTTP headers and parse quite a complex
 structure to determine what is being requested.  You can't just put a file
 in the file system, unlike with separate URIs for distinct representations
 where it just works, instead you need server side processing.  This also
 makes it much harder to cache the responses, as the cache needs to
 determine whether or not the representation has changed -- the cache also
 needs to parse the headers rather than just comparing URI and content.
  For
 large scale systems like DPLA and Europeana, caching is essential for
 quality of service.

 How do you find our which formats are supported by conneg? By reading the
 documentation. Which could just say add .json on the end. The Vary
 header
 tells you that negotiation in the format dimension is possible, just not
 what to do to actually get anything back. There isn't a way to find this
 out from HTTP automatically,so now you need to read both the site's docs
 AND the HTTP docs.  APIs can, on the other hand, do this.  Consider
 OAI-PMH's ListMetadataFormats and SRU's Explain response.

 Instead you can have a separate URI for each representation and link them
 with Link headers, or just a simple rule like add '.json' on the end. No
 need for complicated content negotiation at all.  Link headers can be
 added
 with a simple apache configuration rule, and as they're static are easy to
 cache. So the server side is easy, and the client side is trivial.
   Compared to being difficult at both ends with content negotiation.

 It can be useful to make statements about the different representations,
 and especially if you need to annotate the structure or content.  Or share
 it -- you can't email someone a link that includes the right Accept
 headers
 to send -- as in the post, you need to send them a command line like curl
 with -H.

 An experiment for fans of content negotiation: Have both .json and 302
 style conneg from your original URI to that .json file. Advertise both.
 See
 how many people do the conneg. If it's non-zero, I'll be extremely
 surprised.

 And a challenge: Even with libraries there's still complexity to figuring
 out how and what to serve. Find me sites that correctly implement * based
 fallbacks. Or even process q values. I'll bet I can find 10 that do
 content
 negotiation wrong, for every 1 that does it correctly.  I'll start:
 dx.doi.org touts its content negotiation for metadata, yet doesn't
 implement q values or *s. You have to go to the documentation to figure
 out
 what Accept headers it will do string equality tests against.

 Rob



 On Fri, Nov 29, 2013 at 6:24 AM, Seth van Hooland svhoo...@ulb.ac.be
 wrote:


 Dear all,

 I guess some of you will be interested in the blogpost of my colleague

 and co-author Ruben regarding the misunderstandings on the use and 

Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Jonathan Rochkind
There are plenty of non-free API's, that need some kind of access 
control. A different side discussion is what forms of access control are 
the least barrier to developers while still being secure (a lot of 
services mess this up in both directions!).


However, there are also some free API's whcih still require API keys, 
perhaps because the owners want to track usage or throttle usage or what 
have you.


Sometimes you need to do that too, and you need to restrict access, so 
be it. But it is probably worth recognizing that you are sometimes 
adding barriers to succesful client development here -- it seems like a 
trivial barrier from the perspective of the developers of the service, 
because they use the service so often. But to a client developer working 
with a dozen different API's, the extra burden to get and deal with the 
API key and the access control mechanism can be non-trivial.


I think the best compromise is what Google ends up doing with many of 
their APIs. Allow access without an API key, but with a fairly minimal 
number of accesses-per-time-period allowed (couple hundred a day, is 
what I think google often does). This allows the developer to evaluate 
the api, explore/debug the api in the browser, and write automated tests 
against the api, without worrying about api keys. But still requires an 
api key for 'real' use, so the host can do what tracking or throttling 
they want.


Jonathan

On 12/2/13 12:18 PM, Ross Singer wrote:

I'm not going to defend API keys, but not all APIs are open or free.  You
need to have *some* way to track usage.

There may be alternative ways to implement that, but you can't just hand
wave away the rather large use case for API keys.

-Ross.


On Mon, Dec 2, 2013 at 12:15 PM, Kevin Ford k...@3windmills.com wrote:


Though I have some quibbles with Seth's post, I think it's worth drawing
attention to his repeatedly calling out API keys as a very significant
barrier to use, or at least entry.  Most of the posts here have given
little attention to the issue API keys present.  I can say that I have
quite often looked elsewhere or simply stopped pursuing my idea the moment
I discovered an API key was mandatory.

As for the presumed difficulty with implementing content negotiation (and,
especially, caching on top), it seems that if you can implement an entire
system to manage assignment of and access by API key, then I do not
understand how content negotiation and caching are significantly harder to
implement.

In any event, APIs and content negotiation are not mutually exclusive. One
should be able to use the HTTP URI to access multiple representations of
the resource without recourse to a custom API.

Yours,
Kevin





On 11/29/2013 02:44 PM, Robert Sanderson wrote:


(posted in the comments on the blog and reposted here for further
discussion, if interest)


While I couldn't agree more with the post's starting point -- URIs
identify
(concepts) and use HTTP as your API -- I couldn't disagree more with the
use content negotiation conclusion.

I'm with Dan Cohen in his comment regarding using different URIs for
different representations for several reasons below.

It's harder to implement Content Negotiation than your own API, because
you
get to define your own API whereas you have to follow someone else's rules
when you implement conneg.  You can't get your own API wrong.  I agree
with
Ruben that HTTP is better than rolling your own proprietary API, we
disagree that conneg is the correct solution.  The choice is between
conneg
or regular HTTP, not conneg or a proprietary API.

Secondly, you need to look at the HTTP headers and parse quite a complex
structure to determine what is being requested.  You can't just put a file
in the file system, unlike with separate URIs for distinct representations
where it just works, instead you need server side processing.  This also
makes it much harder to cache the responses, as the cache needs to
determine whether or not the representation has changed -- the cache also
needs to parse the headers rather than just comparing URI and content.
  For
large scale systems like DPLA and Europeana, caching is essential for
quality of service.

How do you find our which formats are supported by conneg? By reading the
documentation. Which could just say add .json on the end. The Vary
header
tells you that negotiation in the format dimension is possible, just not
what to do to actually get anything back. There isn't a way to find this
out from HTTP automatically,so now you need to read both the site's docs
AND the HTTP docs.  APIs can, on the other hand, do this.  Consider
OAI-PMH's ListMetadataFormats and SRU's Explain response.

Instead you can have a separate URI for each representation and link them
with Link headers, or just a simple rule like add '.json' on the end. No
need for complicated content negotiation at all.  Link headers can be
added
with a simple apache configuration rule, and as they're static are easy to
cache. So the 

Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Kevin Ford

 I think the best compromise is what Google ends up doing with many of
 their APIs. Allow access without an API key, but with a fairly minimal
 number of accesses-per-time-period allowed (couple hundred a day, is
 what I think google often does).
-- Agreed.

I certainly didn't mean to suggest that there were not legitimate use 
cases for API keys.  That said, my gut (plus experience sitting in 
multiple meetings during which the need for an access mechanism landed 
on the table as a primary requirement) says people believe they need an 
API key before alternatives have been fully considered and even before 
there is an actual, defined need for one.  Server logs often reveal most 
types of usage statistics service operators are interested in and 
there are ways to throttle traffic at the caching level (the latter can 
be a little tricky to implement, however).


Yours,
Kevin


On 12/02/2013 12:38 PM, Jonathan Rochkind wrote:

There are plenty of non-free API's, that need some kind of access
control. A different side discussion is what forms of access control are
the least barrier to developers while still being secure (a lot of
services mess this up in both directions!).

However, there are also some free API's whcih still require API keys,
perhaps because the owners want to track usage or throttle usage or what
have you.

Sometimes you need to do that too, and you need to restrict access, so
be it. But it is probably worth recognizing that you are sometimes
adding barriers to succesful client development here -- it seems like a
trivial barrier from the perspective of the developers of the service,
because they use the service so often. But to a client developer working
with a dozen different API's, the extra burden to get and deal with the
API key and the access control mechanism can be non-trivial.

I think the best compromise is what Google ends up doing with many of
their APIs. Allow access without an API key, but with a fairly minimal
number of accesses-per-time-period allowed (couple hundred a day, is
what I think google often does). This allows the developer to evaluate
the api, explore/debug the api in the browser, and write automated tests
against the api, without worrying about api keys. But still requires an
api key for 'real' use, so the host can do what tracking or throttling
they want.

Jonathan

On 12/2/13 12:18 PM, Ross Singer wrote:

I'm not going to defend API keys, but not all APIs are open or free.  You
need to have *some* way to track usage.

There may be alternative ways to implement that, but you can't just hand
wave away the rather large use case for API keys.

-Ross.


On Mon, Dec 2, 2013 at 12:15 PM, Kevin Ford k...@3windmills.com wrote:


Though I have some quibbles with Seth's post, I think it's worth drawing
attention to his repeatedly calling out API keys as a very significant
barrier to use, or at least entry.  Most of the posts here have given
little attention to the issue API keys present.  I can say that I have
quite often looked elsewhere or simply stopped pursuing my idea the
moment
I discovered an API key was mandatory.

As for the presumed difficulty with implementing content negotiation
(and,
especially, caching on top), it seems that if you can implement an
entire
system to manage assignment of and access by API key, then I do not
understand how content negotiation and caching are significantly
harder to
implement.

In any event, APIs and content negotiation are not mutually
exclusive. One
should be able to use the HTTP URI to access multiple representations of
the resource without recourse to a custom API.

Yours,
Kevin





On 11/29/2013 02:44 PM, Robert Sanderson wrote:


(posted in the comments on the blog and reposted here for further
discussion, if interest)


While I couldn't agree more with the post's starting point -- URIs
identify
(concepts) and use HTTP as your API -- I couldn't disagree more with
the
use content negotiation conclusion.

I'm with Dan Cohen in his comment regarding using different URIs for
different representations for several reasons below.

It's harder to implement Content Negotiation than your own API, because
you
get to define your own API whereas you have to follow someone else's
rules
when you implement conneg.  You can't get your own API wrong.  I agree
with
Ruben that HTTP is better than rolling your own proprietary API, we
disagree that conneg is the correct solution.  The choice is between
conneg
or regular HTTP, not conneg or a proprietary API.

Secondly, you need to look at the HTTP headers and parse quite a
complex
structure to determine what is being requested.  You can't just put
a file
in the file system, unlike with separate URIs for distinct
representations
where it just works, instead you need server side processing.  This
also
makes it much harder to cache the responses, as the cache needs to
determine whether or not the representation has changed -- the cache
also
needs to parse the headers rather than just comparing 

Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Robert Sanderson
To be (more) controversial...

If it's okay to require headers, why can't API keys go in a header rather
than the URL.
Then it's just the same as content negotiation, it seems to me. You send a
header and get a different response from the same URI.

Rob



On Mon, Dec 2, 2013 at 10:57 AM, Edward Summers e...@pobox.com wrote:

 On Dec 3, 2013, at 4:18 AM, Ross Singer rossfsin...@gmail.com wrote:
  I'm not going to defend API keys, but not all APIs are open or free.  You
  need to have *some* way to track usage.

 A key (haha) thing that keys also provide is an opportunity to have a
 conversation with the user of your api: who are they, how could you get in
 touch with them, what are they doing with the API, what would they like to
 do with the API, what doesn’t work? These questions are difficult to ask if
 they are just a IP address in your access log.

 //Ed



Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Jonathan Rochkind

I do frequently see API keys in header, it is a frequent pattern.

Anything that requires things in the header, in my experience makes the 
API more 'expensive' to develop against. I'm not sure it is okay to 
require headers.


Which is why I suggested allowing format specification in the URL, not 
just conneg headers. And is also, actually, why I expressed admiration 
for google's pattern of allowing X requests a day without an api key. 
Both things allow you to play with the api in a browser without headers.


If you are requiring a cryptographic signature (ala HMAC) for your 
access control, you can't feasibly play with it in a browser anyway, it 
doesn't matter whether it's supplied in headers or query params. And 
(inconvenient) HMAC probably is the only actually secure way to do api 
access control, depending on what level of security is called for.


On 12/2/13 1:03 PM, Robert Sanderson wrote:

To be (more) controversial...

If it's okay to require headers, why can't API keys go in a header rather
than the URL.
Then it's just the same as content negotiation, it seems to me. You send a
header and get a different response from the same URI.

Rob



On Mon, Dec 2, 2013 at 10:57 AM, Edward Summers e...@pobox.com wrote:


On Dec 3, 2013, at 4:18 AM, Ross Singer rossfsin...@gmail.com wrote:

I'm not going to defend API keys, but not all APIs are open or free.  You
need to have *some* way to track usage.


A key (haha) thing that keys also provide is an opportunity to have a
conversation with the user of your api: who are they, how could you get in
touch with them, what are they doing with the API, what would they like to
do with the API, what doesn’t work? These questions are difficult to ask if
they are just a IP address in your access log.

//Ed






Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Edward Summers
Amazon Web Services (which is probably the most heavily used API on the Web) 
use HTTP headers for authentication. But I guess developers typically use 
software libraries to access AWS rather than making the HTTP calls directly.

//Ed


Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Kevin Ford

 A key (haha) thing that keys also provide is an opportunity
 to have a conversation with the user of your api: who are they,
 how could you get in touch with them, what are they doing with
 the API, what would they like to do with the API, what doesn’t
 work? These questions are difficult to ask if they are just a
 IP address in your access log.
-- True, but, again, there are other ways to go about this.

I've baulked at doing just this in the past because it reveals the raw 
and primary purpose behind an API key: to track individual user 
usage/access.  I would feel a little awkward writing (and receiving, 
incidentally) a message that began:



--

Hello,

I saw you using our service.  What are you doing with our data?

Cordially,
Data service team

---


And, if you cringe a little at the ramifications of the above, then why 
do you need user-specific granularity?   (That's really not meant to be 
a rhetorical question - I would genuinely be interested in whether my 
notions of open and free are outmoded and based too much in a 
theoretical purity that unnecessary tracking is a violation of privacy).


Unless the API key exists to control specific, user-level access 
precisely because this is a facet of the underlying service, I feel 
somewhere in all of this the service has violated, in some way, the 
notion that it is open and/or free, assuming it has billed itself as 
such.  Otherwise, it's free and open as in Google or Facebook.


All that said, I think a data service can smooth things over greatly by 
not insisting on a developer signing a EULA (which is essentially what 
happens when one requests an API key) before even trying the service or 
desiring the most basic of data access.  There are middle ground solutions.


Yours,
Kevin





On 12/02/2013 12:57 PM, Edward Summers wrote:

On Dec 3, 2013, at 4:18 AM, Ross Singer rossfsin...@gmail.com wrote:

I'm not going to defend API keys, but not all APIs are open or free.  You
need to have *some* way to track usage.


A key (haha) thing that keys also provide is an opportunity to have a 
conversation with the user of your api: who are they, how could you get in 
touch with them, what are they doing with the API, what would they like to do 
with the API, what doesn’t work? These questions are difficult to ask if they 
are just a IP address in your access log.

//Ed



Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Miles Fidelman
umm... it's called HTTP-AUTH, and if you really want to be cool, use an 
X.509 client cert for authorization (see geoserver as an example that 
works very cleanly - 
http://docs.geoserver.org/latest/en/user/security/tutorials/cert/index.html; 
the freebxml registry-repository also uses X.509 based authentication in 
a reasonably clean manner)


Robert Sanderson wrote:

To be (more) controversial...

If it's okay to require headers, why can't API keys go in a header rather
than the URL.
Then it's just the same as content negotiation, it seems to me. You send a
header and get a different response from the same URI.

Rob



On Mon, Dec 2, 2013 at 10:57 AM, Edward Summers e...@pobox.com wrote:


On Dec 3, 2013, at 4:18 AM, Ross Singer rossfsin...@gmail.com wrote:

I'm not going to defend API keys, but not all APIs are open or free.  You
need to have *some* way to track usage.

A key (haha) thing that keys also provide is an opportunity to have a
conversation with the user of your api: who are they, how could you get in
touch with them, what are they doing with the API, what would they like to
do with the API, what doesn’t work? These questions are difficult to ask if
they are just a IP address in your access log.

//Ed



Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Joe Hourcle
On Dec 2, 2013, at 1:25 PM, Kevin Ford wrote:

  A key (haha) thing that keys also provide is an opportunity
  to have a conversation with the user of your api: who are they,
  how could you get in touch with them, what are they doing with
  the API, what would they like to do with the API, what doesn’t
  work? These questions are difficult to ask if they are just a
  IP address in your access log.
 -- True, but, again, there are other ways to go about this.
 
 I've baulked at doing just this in the past because it reveals the raw and 
 primary purpose behind an API key: to track individual user usage/access.  I 
 would feel a little awkward writing (and receiving, incidentally) a message 
 that began:
 
 --
 Hello,
 
 I saw you using our service.  What are you doing with our data?
 
 Cordially,
 Data service team
 --

It's better than posting to a website:

We can't justify keeping this API maintained / available,
because we have no idea who's using it, or what they're
using it for.

Or:

We've had to shut down the API because we'd had people
abusing the API and we can't easily single them out as
it's not just coming from a single IP range.

We don't require API keys here, but we *do* send out messages
to our designated community every couple of years with:

If you use our APIs, please send a letter of support
that we can include in our upcoming Senior Review.

(Senior Review is NASA's peer-review of operating projects,
where they bring in outsiders to judge if it's justifiable to
continue funding them, and if so, at what level)


Personally, I like the idea of allowing limited use without
a key (be it number of accesses per day, number of concurrent
accesses, or some other rate limiting), but as someone who has
been operating APIs for years and is *not* *allowed* to track
users, I've seen quite a few times when it would've made my
life so much easier.



 And, if you cringe a little at the ramifications of the above, then why do 
 you need user-specific granularity?   (That's really not meant to be a 
 rhetorical question - I would genuinely be interested in whether my notions 
 of open and free are outmoded and based too much in a theoretical purity 
 that unnecessary tracking is a violation of privacy).

You're assuming that you're actually correlating API calls
to the users ... it may just be an authentication system
and nothing past that.


 Unless the API key exists to control specific, user-level access precisely 
 because this is a facet of the underlying service, I feel somewhere in all of 
 this the service has violated, in some way, the notion that it is open 
 and/or free, assuming it has billed itself as such.  Otherwise, it's free 
 and open as in Google or Facebook.

You're also assuming that we've claimed that our services
are 'open'.  (mine are, but I know of plenty of them that
have to deal with authorization, as they manage embargoed
or otherwise restricted items).

Of course, you can also set up some sort of 'guest'
privileges for non-authenticated users so they just wouldn't
see the restricted content.


 All that said, I think a data service can smooth things over greatly by not 
 insisting on a developer signing a EULA (which is essentially what happens 
 when one requests an API key) before even trying the service or desiring the 
 most basic of data access.  There are middle ground solutions.

I do have problems with EULAs ... one in that we have to
get things approved by our legal department, second in that
they're often written completely one-sided and third in
that they're often written assuming personal use.

Twitter and Facebook had to make available alternate EULAs
so that governments could use them ... because you can't
hold the person who signed up for the account responsible
for it.  (and they don't want it 'owned' by that person
should they be fired, etc.)

... but sometimes they're less restrictive ... more TOS
than EULA.  Without it, you've got absolutely no sort of
SLA ... if they want to take down their API, or block you,
you've got no recourse at all.

-Joe


Re: [CODE4LIB] The lie of the API

2013-12-02 Thread Fitchett, Deborah
Environment Canterbury has a click-through screen making you accept their terms 
and conditions before you get access to the API, and they use that as an 
opportunity to ask some questions about your intended use. Then once you've 
answered those you get direct access to the API as beautiful plain XML. (Okay, 
XML which possibly overuses attributes to carry data instead of tags, but I 
eventually figured out how to make my server's version of PHP happy with that.) 
It's glorious.  It made me so happy that I went back to their click-through 
screen to give them some more information about what I was doing.

When I had to try and navigate Twitter's API and authentication models, 
however... Well, I absolutely understand the need for it, but it'll be a long 
time before I ever try that again.

Deborah

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward 
Summers
Sent: Tuesday, 3 December 2013 6:57 a.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] The lie of the API

On Dec 3, 2013, at 4:18 AM, Ross Singer rossfsin...@gmail.com wrote:
 I'm not going to defend API keys, but not all APIs are open or free.  
 You need to have *some* way to track usage.

A key (haha) thing that keys also provide is an opportunity to have a 
conversation with the user of your api: who are they, how could you get in 
touch with them, what are they doing with the API, what would they like to do 
with the API, what doesn't work? These questions are difficult to ask if they 
are just a IP address in your access log.

//Ed


P Please consider the environment before you print this email.
The contents of this e-mail (including any attachments) may be confidential 
and/or subject to copyright. Any unauthorised use, distribution, or copying of 
the contents is expressly prohibited. If you have received this e-mail in 
error, please advise the sender by return e-mail or telephone and then delete 
this e-mail together with all attachments from your system.


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread LeVan,Ralph
I'm confused about the supposed distinction between content negotiation and 
explicit content request in a URL.  The reason I'm confused is that the 
response to content negotiation is supposed to be a content location header 
with a URL that is guaranteed to return the negotiated content.  In other 
words, there *must* be a form of the URL that bypasses content negotiation.  If 
you can do content negotiation, then you should have a URL form that doesn't 
require content negotiation.

Ralph

From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Robert 
Sanderson azarot...@gmail.com
Sent: Friday, November 29, 2013 2:44 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: The lie of the API

(posted in the comments on the blog and reposted here for further
discussion, if interest)


While I couldn't agree more with the post's starting point -- URIs identify
(concepts) and use HTTP as your API -- I couldn't disagree more with the
use content negotiation conclusion.

I'm with Dan Cohen in his comment regarding using different URIs for
different representations for several reasons below.

It's harder to implement Content Negotiation than your own API, because you
get to define your own API whereas you have to follow someone else's rules
when you implement conneg.  You can't get your own API wrong.  I agree with
Ruben that HTTP is better than rolling your own proprietary API, we
disagree that conneg is the correct solution.  The choice is between conneg
or regular HTTP, not conneg or a proprietary API.

Secondly, you need to look at the HTTP headers and parse quite a complex
structure to determine what is being requested.  You can't just put a file
in the file system, unlike with separate URIs for distinct representations
where it just works, instead you need server side processing.  This also
makes it much harder to cache the responses, as the cache needs to
determine whether or not the representation has changed -- the cache also
needs to parse the headers rather than just comparing URI and content.  For
large scale systems like DPLA and Europeana, caching is essential for
quality of service.

How do you find our which formats are supported by conneg? By reading the
documentation. Which could just say add .json on the end. The Vary header
tells you that negotiation in the format dimension is possible, just not
what to do to actually get anything back. There isn't a way to find this
out from HTTP automatically,so now you need to read both the site's docs
AND the HTTP docs.  APIs can, on the other hand, do this.  Consider
OAI-PMH's ListMetadataFormats and SRU's Explain response.

Instead you can have a separate URI for each representation and link them
with Link headers, or just a simple rule like add '.json' on the end. No
need for complicated content negotiation at all.  Link headers can be added
with a simple apache configuration rule, and as they're static are easy to
cache. So the server side is easy, and the client side is trivial.
 Compared to being difficult at both ends with content negotiation.

It can be useful to make statements about the different representations,
and especially if you need to annotate the structure or content.  Or share
it -- you can't email someone a link that includes the right Accept headers
to send -- as in the post, you need to send them a command line like curl
with -H.

An experiment for fans of content negotiation: Have both .json and 302
style conneg from your original URI to that .json file. Advertise both. See
how many people do the conneg. If it's non-zero, I'll be extremely
surprised.

And a challenge: Even with libraries there's still complexity to figuring
out how and what to serve. Find me sites that correctly implement * based
fallbacks. Or even process q values. I'll bet I can find 10 that do content
negotiation wrong, for every 1 that does it correctly.  I'll start:
dx.doi.org touts its content negotiation for metadata, yet doesn't
implement q values or *s. You have to go to the documentation to figure out
what Accept headers it will do string equality tests against.

Rob



On Fri, Nov 29, 2013 at 6:24 AM, Seth van Hooland svhoo...@ulb.ac.be
wrote:

 Dear all,

 I guess some of you will be interested in the blogpost of my colleague
and co-author Ruben regarding the misunderstandings on the use and abuse of
APIs in a digital libraries context, including a description of both good
and bad practices from Europeana, DPLA and the Cooper Hewitt museum:

 http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/

 Kind regards,

 Seth van Hooland
 Président du Master en Sciences et Technologies de l'Information et de la
Communication (MaSTIC)
 Université Libre de Bruxelles
 Av. F.D. Roosevelt, 50 CP 123  | 1050 Bruxelles
 http://homepages.ulb.ac.be/~svhoolan/
 http://twitter.com/#!/sethvanhooland
 http://mastic.ulb.ac.be
 0032 2 650 4765
 Office: DC11.102


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Richard Wallis
It's harder to implement Content Negotiation than your own API, because you
get to define your own API whereas you have to follow someone else's rules

Don't wish your implementation problems on the consumers of your data.
There are [you would hope] far more of them than of you ;-)

Content-negotiation is an already established mechanism - why invent a new,
and different, one just for *your* data?

Put your self in the place of your consumer having to get their head around
yet another site specific API pattern.

As to discovering then using the (currently implemented) URI returned from
a content-negotiated call  - The standard http libraries take care of that,
like any other http redirects (301,303, etc) plus you are protected from
any future backend server implementation changes.


~Richard


On 1 December 2013 20:51, LeVan,Ralph le...@oclc.org wrote:

 I'm confused about the supposed distinction between content negotiation
 and explicit content request in a URL.  The reason I'm confused is that the
 response to content negotiation is supposed to be a content location header
 with a URL that is guaranteed to return the negotiated content.  In other
 words, there *must* be a form of the URL that bypasses content negotiation.
  If you can do content negotiation, then you should have a URL form that
 doesn't require content negotiation.

 Ralph
 
 From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Robert
 Sanderson azarot...@gmail.com
 Sent: Friday, November 29, 2013 2:44 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: The lie of the API

 (posted in the comments on the blog and reposted here for further
 discussion, if interest)


 While I couldn't agree more with the post's starting point -- URIs identify
 (concepts) and use HTTP as your API -- I couldn't disagree more with the
 use content negotiation conclusion.

 I'm with Dan Cohen in his comment regarding using different URIs for
 different representations for several reasons below.

 It's harder to implement Content Negotiation than your own API, because you
 get to define your own API whereas you have to follow someone else's rules
 when you implement conneg.  You can't get your own API wrong.  I agree with
 Ruben that HTTP is better than rolling your own proprietary API, we
 disagree that conneg is the correct solution.  The choice is between conneg
 or regular HTTP, not conneg or a proprietary API.

 Secondly, you need to look at the HTTP headers and parse quite a complex
 structure to determine what is being requested.  You can't just put a file
 in the file system, unlike with separate URIs for distinct representations
 where it just works, instead you need server side processing.  This also
 makes it much harder to cache the responses, as the cache needs to
 determine whether or not the representation has changed -- the cache also
 needs to parse the headers rather than just comparing URI and content.  For
 large scale systems like DPLA and Europeana, caching is essential for
 quality of service.

 How do you find our which formats are supported by conneg? By reading the
 documentation. Which could just say add .json on the end. The Vary header
 tells you that negotiation in the format dimension is possible, just not
 what to do to actually get anything back. There isn't a way to find this
 out from HTTP automatically,so now you need to read both the site's docs
 AND the HTTP docs.  APIs can, on the other hand, do this.  Consider
 OAI-PMH's ListMetadataFormats and SRU's Explain response.

 Instead you can have a separate URI for each representation and link them
 with Link headers, or just a simple rule like add '.json' on the end. No
 need for complicated content negotiation at all.  Link headers can be added
 with a simple apache configuration rule, and as they're static are easy to
 cache. So the server side is easy, and the client side is trivial.
  Compared to being difficult at both ends with content negotiation.

 It can be useful to make statements about the different representations,
 and especially if you need to annotate the structure or content.  Or share
 it -- you can't email someone a link that includes the right Accept headers
 to send -- as in the post, you need to send them a command line like curl
 with -H.

 An experiment for fans of content negotiation: Have both .json and 302
 style conneg from your original URI to that .json file. Advertise both. See
 how many people do the conneg. If it's non-zero, I'll be extremely
 surprised.

 And a challenge: Even with libraries there's still complexity to figuring
 out how and what to serve. Find me sites that correctly implement * based
 fallbacks. Or even process q values. I'll bet I can find 10 that do content
 negotiation wrong, for every 1 that does it correctly.  I'll start:
 dx.doi.org touts its content negotiation for metadata, yet doesn't
 implement q values or *s. You have to go to the documentation to figure out
 what 

Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Joe Hourcle
On Dec 1, 2013, at 3:51 PM, LeVan,Ralph wrote:

 I'm confused about the supposed distinction between content negotiation and 
 explicit content request in a URL.  The reason I'm confused is that the 
 response to content negotiation is supposed to be a content location header 
 with a URL that is guaranteed to return the negotiated content.  In other 
 words, there *must* be a form of the URL that bypasses content negotiation.  
 If you can do content negotiation, then you should have a URL form that 
 doesn't require content negotiation.

There are three types of content negotiation discussed in HTTP/1.1.  The
one that most gets used is 'transparent negotiation' which results in
there being different content served under a single URL.

Transparent negotiation schemes do *not* redirect to a new URL to allow
the cache or browser to identify the specific content returned.  (this
would require an extra round trip, as you'd have to send a Location:
header to redirect, then have the browser request the new page)

So that you don't screw up web proxies, you have to specify the 'Vary'
header to tell which parameters you consider significant so that it
knows what is or isn't cacheable.  So if you might serve different
content based on the Accept and Accept-Encoding would return:

Vary: Accept, Accept-Encoding

(Including 'User Agent' is problematic because of some browsers
that pack in every module + the version in there, making there be so
many permutations that many proxies will refuse to cache it)

-Joe

(who has been managing web servers since HTTP/0.9, and gets 
annoyed when I have to explain to our security folks each year
why I don't reject pre-HTTP/1.1 requests or follow the rest of
the CIS benchmark recommendations that cause our web services to
fail horribly)


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Barnes, Hugh
+1 to all of Richard's points here. Making something easier for you to develop 
is no justification for making it harder to consume or deviating from well 
supported standards.

[Robert]
  You can't 
 just put a file in the file system, unlike with separate URIs for 
 distinct representations where it just works, instead you need server 
 side processing.

If we introduce languages into the negotiation, this won't scale.

[Robert]
 This also makes it much harder to cache the 
 responses, as the cache needs to determine whether or not the 
 representation has changed -- the cache also needs to parse the 
 headers rather than just comparing URI and content.  

Don't know caches intimately, but I don't see why that's algorithmically 
difficult. Just look at the Content-type of the response. Is it harder for 
caches to examine headers than content or URI? (That's an earnest, perhaps 
naïve, question.)

If we are talking about caching on the client here (not caching proxies), I 
would think in most cases requests are issued with the same Accept-* headers, 
so caching will work as expected anyway.

[Robert]
 Link headers 
 can be added with a simple apache configuration rule, and as they're 
 static are easy to cache. So the server side is easy, and the client side is 
 trivial.

Hadn't heard of these. (They are on Wikipedia so they must be real.) What do 
they offer over HTML link elements populated from the Dublin Core Element Set?

---

My ideal setup would be to maintain a canonical URL that always serves the 
clients' flavour of representation (format/language), which could vary, but 
points to other representations (and versions for that matter) at separate URLs 
through a mechanism like HTML link elements.

My whatever it's worth . great topic, though, thanks Robert :)

Cheers

Hugh Barnes
Digital Access Coordinator
Library, Teaching and Learning
Lincoln University
Christchurch
New Zealand
p +64 3 423 0357

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Richard 
Wallis
Sent: Monday, 2 December 2013 12:26 p.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] The lie of the API

It's harder to implement Content Negotiation than your own API, because you 
get to define your own API whereas you have to follow someone else's rules

Don't wish your implementation problems on the consumers of your data.
There are [you would hope] far more of them than of you ;-)

Content-negotiation is an already established mechanism - why invent a new, and 
different, one just for *your* data?

Put your self in the place of your consumer having to get their head around yet 
another site specific API pattern.

As to discovering then using the (currently implemented) URI returned from a 
content-negotiated call  - The standard http libraries take care of that, like 
any other http redirects (301,303, etc) plus you are protected from any future 
backend server implementation changes.


~Richard


On 1 December 2013 20:51, LeVan,Ralph le...@oclc.org wrote:

 I'm confused about the supposed distinction between content 
 negotiation and explicit content request in a URL.  The reason I'm 
 confused is that the response to content negotiation is supposed to be 
 a content location header with a URL that is guaranteed to return the 
 negotiated content.  In other words, there *must* be a form of the URL that 
 bypasses content negotiation.
  If you can do content negotiation, then you should have a URL form 
 that doesn't require content negotiation.

 Ralph
 
 From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of 
 Robert Sanderson azarot...@gmail.com
 Sent: Friday, November 29, 2013 2:44 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: The lie of the API

 (posted in the comments on the blog and reposted here for further 
 discussion, if interest)


 While I couldn't agree more with the post's starting point -- URIs 
 identify
 (concepts) and use HTTP as your API -- I couldn't disagree more with 
 the use content negotiation conclusion.

 I'm with Dan Cohen in his comment regarding using different URIs for 
 different representations for several reasons below.

 It's harder to implement Content Negotiation than your own API, 
 because you get to define your own API whereas you have to follow 
 someone else's rules when you implement conneg.  You can't get your 
 own API wrong.  I agree with Ruben that HTTP is better than rolling 
 your own proprietary API, we disagree that conneg is the correct 
 solution.  The choice is between conneg or regular HTTP, not conneg or a 
 proprietary API.

 Secondly, you need to look at the HTTP headers and parse quite a 
 complex structure to determine what is being requested.  You can't 
 just put a file in the file system, unlike with separate URIs for 
 distinct representations where it just works, instead you need server 
 side processing.  This also makes it much harder to cache the 
 responses

Re: [CODE4LIB] The lie of the API

2013-12-01 Thread LeVan,Ralph
Returning a content location header does not require a redirect.  You can 
return the negotiated content with the first response than still tell the 
client how it could have asked for that same content without negotiation.  
That's what the content location header means in the absence of a redirect 
status code.

Ralph

From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Joe Hourcle 
onei...@grace.nascom.nasa.gov
Sent: Sunday, December 01, 2013 6:39 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: The lie of the API

On Dec 1, 2013, at 3:51 PM, LeVan,Ralph wrote:

 I'm confused about the supposed distinction between content negotiation and 
 explicit content request in a URL.  The reason I'm confused is that the 
 response to content negotiation is supposed to be a content location header 
 with a URL that is guaranteed to return the negotiated content.  In other 
 words, there *must* be a form of the URL that bypasses content negotiation.  
 If you can do content negotiation, then you should have a URL form that 
 doesn't require content negotiation.

There are three types of content negotiation discussed in HTTP/1.1.  The
one that most gets used is 'transparent negotiation' which results in
there being different content served under a single URL.

Transparent negotiation schemes do *not* redirect to a new URL to allow
the cache or browser to identify the specific content returned.  (this
would require an extra round trip, as you'd have to send a Location:
header to redirect, then have the browser request the new page)

So that you don't screw up web proxies, you have to specify the 'Vary'
header to tell which parameters you consider significant so that it
knows what is or isn't cacheable.  So if you might serve different
content based on the Accept and Accept-Encoding would return:

Vary: Accept, Accept-Encoding

(Including 'User Agent' is problematic because of some browsers
that pack in every module + the version in there, making there be so
many permutations that many proxies will refuse to cache it)

-Joe

(who has been managing web servers since HTTP/0.9, and gets
annoyed when I have to explain to our security folks each year
why I don't reject pre-HTTP/1.1 requests or follow the rest of
the CIS benchmark recommendations that cause our web services to
fail horribly)


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Joe Hourcle
On Dec 1, 2013, at 7:57 PM, Barnes, Hugh wrote:

 +1 to all of Richard's points here. Making something easier for you to 
 develop is no justification for making it harder to consume or deviating from 
 well supported standards.
 
 [Robert]
 You can't 
 just put a file in the file system, unlike with separate URIs for 
 distinct representations where it just works, instead you need server 
 side processing.
 
 If we introduce languages into the negotiation, this won't scale.

It depends on what you qualify as 'scaling'.  You can configure
Apache and some other servers so that you pre-generate files such
as :

index.en.html
index.de.html
index.es.html
index.fr.html

... It's even the default for some distributions.

Then, depending on what the Accept-Language header is sent,
the server returns the appropriate response.  The only issue
is that the server assumes that the 'quality' of all of the
translations are equivalent.

You know that 'q=0.9' stuff?  There's actually a scale in
RFC 2295, that equates the different qualities to how much
content is lost in that particular version:

  Servers should use the following table a guide when assigning source
  quality values:

 1.000  perfect representation
 0.900  threshold of noticeable loss of quality
 0.800  noticeable, but acceptable quality reduction
 0.500  barely acceptable quality
 0.300  severely degraded quality
 0.000  completely degraded quality





 [Robert]
 This also makes it much harder to cache the 
 responses, as the cache needs to determine whether or not the 
 representation has changed -- the cache also needs to parse the 
 headers rather than just comparing URI and content.  
 
 Don't know caches intimately, but I don't see why that's algorithmically 
 difficult. Just look at the Content-type of the response. Is it harder for 
 caches to examine headers than content or URI? (That's an earnest, perhaps 
 naïve, question.)

See my earlier response.  The problem is without a 'Vary' header or
other cache-control headers, caches may assume that a URL is a fixed
resource.

If it were to assume that was static, then it wouldn't matter what
was sent for the Accept, Accept-Encoding or Accept-Language ... and
so the first request proxied gets cached, and then subsequent
requests get the cached copy, even if that's not what the server
would have sent.


 If we are talking about caching on the client here (not caching proxies), I 
 would think in most cases requests are issued with the same Accept-* headers, 
 so caching will work as expected anyway.

I assume he's talking about caching proxies, where it's a real
problem.


 [Robert]
 Link headers 
 can be added with a simple apache configuration rule, and as they're 
 static are easy to cache. So the server side is easy, and the client side is 
 trivial.
 
 Hadn't heard of these. (They are on Wikipedia so they must be real.) What do 
 they offer over HTML link elements populated from the Dublin Core Element 
 Set?

Wikipedia was the first place you looked?  Not IETF or W3C?
No wonder people say libraries are doomed, if even people who work
in libraries go straight to Wikipedia.


...


oh, and I should follow up to my posting from earlier tonight --
upon re-reading the HTTP/1.1 spec, it seems that there *is* a way to
specify the authoritative URL returned without an HTTP round-trip,
Content-Location :

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.14

Of course, it doesn't look like my web browser does anything with
it:

http://www.w3.org/Protocols/rfc2616/rfc2616
http://www.w3.org/Protocols/rfc2616/rfc2616.html
http://www.w3.org/Protocols/rfc2616/rfc2616.txt

... so you'd still have to use Location: if you wanted it to 
show up to the general public.

-Joe


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Barnes, Hugh
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joe 
Hourcle

 (They are on Wikipedia so they must be real.)

 Wikipedia was the first place you looked?  Not IETF or W3C?
 No wonder people say libraries are doomed, if even people who work in 
 libraries go straight to Wikipedia.

It was a humorous aside, regrettably lacking a smiley.

I think that comment would be better saved to pitch at folks who cite and link 
to w3schools as if authoritative. Some of them are even in libraries.

Your other comments were informative, though. Thank you :)

Cheers
Hugh



P Please consider the environment before you print this email.
The contents of this e-mail (including any attachments) may be confidential 
and/or subject to copyright. Any unauthorised use, distribution, or copying of 
the contents is expressly prohibited. If you have received this e-mail in 
error, please advise the sender by return e-mail or telephone and then delete 
this e-mail together with all attachments from your system.


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Joe Hourcle
On Dec 1, 2013, at 9:36 PM, Barnes, Hugh wrote:

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joe 
 Hourcle
 
 (They are on Wikipedia so they must be real.)
 
 Wikipedia was the first place you looked?  Not IETF or W3C?
 No wonder people say libraries are doomed, if even people who work in 
 libraries go straight to Wikipedia.
 
 It was a humorous aside, regrettably lacking a smiley.

Yes, a smiley would have helped.

It also doesn't help that there used to be a website out there
named 'ScoopThis'.  They started as a wrestling parody site, but
my favorite part was their advice column from 'Dusty the Fat,
Bitter Cat'.

I bring this up because their slogan was cuz if it’s on the net,
it’s got to be true ... so I twitch a little whenever someone
says something similar to that phrase.

(unfortunately, the site's gone, and archive.org didn't cache
them, so you can't see the photoshopped pictures of Dusty
at Woodstock '99 or the Rock's cooking show.  They started up
a separate website for Dusty, but when they closed that one
down, they put up a parody of a porn site, so you probably
don't want to go looking for it)


 I think that comment would be better saved to pitch at folks who cite and 
 link to w3schools as if authoritative. Some of them are even in libraries.

Although I wish that w3schools would stop showing up so highly
in searches for javascript methods  css attributes, they
did have a time when they were some of the best tutorials out
there on web-related topics.  I don't know if I can claim that
to be true today, though.


 Your other comments were informative, though. Thank you :)

I try ... especially when I'm procrastinating on doing posters
that I need to have printed by Friday.

(but if anyone has any complaints about data.gov or other
federal data dissemination efforts, I'll be happy to work
them in)

-Joe


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Simon Spero
On Dec 1, 2013 6:42 PM, Joe Hourcle onei...@grace.nascom.nasa.gov wrote:

 So that you don't screw up web proxies, you have to specify the 'Vary'
header to tell which parameters you consider significant so that it knows
what is or isn't cacheable.

I believe that if a Vary isn't specified, and the content is not marked as
non cachable,  a cache must assume Vary:*, but I might be misremembering.

 (who has been managing web servers since HTTP/0.9, and gets annoyed when
I have to explain to our security folks each year  why I don't reject
pre-HTTP/1.1 requests or follow the rest of  the CIS benchmark
recommendations that cause our web services to fail horribly)

Old school represent (0.9 could out perform 1.0 if the request headers were
more than 1 MTU or the first line was sent in a separate packet with nagle
enabled). [Accept was a major cause of header bloat].


Re: [CODE4LIB] The lie of the API

2013-12-01 Thread Joe Hourcle
On Dec 1, 2013, at 11:12 PM, Simon Spero wrote:

 On Dec 1, 2013 6:42 PM, Joe Hourcle onei...@grace.nascom.nasa.gov wrote:
 
 So that you don't screw up web proxies, you have to specify the 'Vary'
 header to tell which parameters you consider significant so that it knows
 what is or isn't cacheable.
 
 I believe that if a Vary isn't specified, and the content is not marked as
 non cachable,  a cache must assume Vary:*, but I might be misremembering

That would be horrible for caching proxies to assume that nothing's
cacheable unless it said it was.  (as typically only the really big
websites or those that have seen some obvious problems bother with
setting cache control headers.)

I haven't done any exhaustive tests in many years, but I was noticing
that proxies were starting to cache GET requests with query strings,
which bothered me -- it used to be that anything that was an obvious
CGI wasn't cached.  (I guess that enough sites use it, it has to make
the assumption that the sites aren't stateful, and that the parameters
in the URL are enough information for hashing)



 (who has been managing web servers since HTTP/0.9, and gets annoyed when
 I have to explain to our security folks each year  why I don't reject
 pre-HTTP/1.1 requests or follow the rest of  the CIS benchmark
 recommendations that cause our web services to fail horribly)
 
 Old school represent (0.9 could out perform 1.0 if the request headers were
 more than 1 MTU or the first line was sent in a separate packet with nagle
 enabled). [Accept was a major cause of header bloat].

Don't even get me started on header bloat ... 

My main complaint about HTTP/1.1 is that it requires clients to support
chunked encoding, and I've got to support a client that's got a buggy
implementation.  (and then my CGIs that serve 2GB tarballs start
failing, and it's calling a program that's not smart enough to look
for SIG_PIPE, so I end up with a dozen of 'em going all stupid and
sucking down CPU on one of my servers)

Most people don't have to support a community written HTTP client,
though.  (and the one alternative HTTP client in IDL doesn't let me
interactive  w/ the HTTP headers directly, so I can't put a wrapper
around it to extract the tarball's filename from the Content-Disposition
header)

-Joe

ps.  yep, still having writer's block on posters.


Re: [CODE4LIB] The lie of the API

2013-11-29 Thread Robert Sanderson
(posted in the comments on the blog and reposted here for further
discussion, if interest)


While I couldn't agree more with the post's starting point -- URIs identify
(concepts) and use HTTP as your API -- I couldn't disagree more with the
use content negotiation conclusion.

I'm with Dan Cohen in his comment regarding using different URIs for
different representations for several reasons below.

It's harder to implement Content Negotiation than your own API, because you
get to define your own API whereas you have to follow someone else's rules
when you implement conneg.  You can't get your own API wrong.  I agree with
Ruben that HTTP is better than rolling your own proprietary API, we
disagree that conneg is the correct solution.  The choice is between conneg
or regular HTTP, not conneg or a proprietary API.

Secondly, you need to look at the HTTP headers and parse quite a complex
structure to determine what is being requested.  You can't just put a file
in the file system, unlike with separate URIs for distinct representations
where it just works, instead you need server side processing.  This also
makes it much harder to cache the responses, as the cache needs to
determine whether or not the representation has changed -- the cache also
needs to parse the headers rather than just comparing URI and content.  For
large scale systems like DPLA and Europeana, caching is essential for
quality of service.

How do you find our which formats are supported by conneg? By reading the
documentation. Which could just say add .json on the end. The Vary header
tells you that negotiation in the format dimension is possible, just not
what to do to actually get anything back. There isn't a way to find this
out from HTTP automatically,so now you need to read both the site's docs
AND the HTTP docs.  APIs can, on the other hand, do this.  Consider
OAI-PMH's ListMetadataFormats and SRU's Explain response.

Instead you can have a separate URI for each representation and link them
with Link headers, or just a simple rule like add '.json' on the end. No
need for complicated content negotiation at all.  Link headers can be added
with a simple apache configuration rule, and as they're static are easy to
cache. So the server side is easy, and the client side is trivial.
 Compared to being difficult at both ends with content negotiation.

It can be useful to make statements about the different representations,
and especially if you need to annotate the structure or content.  Or share
it -- you can't email someone a link that includes the right Accept headers
to send -- as in the post, you need to send them a command line like curl
with -H.

An experiment for fans of content negotiation: Have both .json and 302
style conneg from your original URI to that .json file. Advertise both. See
how many people do the conneg. If it's non-zero, I'll be extremely
surprised.

And a challenge: Even with libraries there's still complexity to figuring
out how and what to serve. Find me sites that correctly implement * based
fallbacks. Or even process q values. I'll bet I can find 10 that do content
negotiation wrong, for every 1 that does it correctly.  I'll start:
dx.doi.org touts its content negotiation for metadata, yet doesn't
implement q values or *s. You have to go to the documentation to figure out
what Accept headers it will do string equality tests against.

Rob



On Fri, Nov 29, 2013 at 6:24 AM, Seth van Hooland svhoo...@ulb.ac.be
wrote:

 Dear all,

 I guess some of you will be interested in the blogpost of my colleague
and co-author Ruben regarding the misunderstandings on the use and abuse of
APIs in a digital libraries context, including a description of both good
and bad practices from Europeana, DPLA and the Cooper Hewitt museum:

 http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/

 Kind regards,

 Seth van Hooland
 Président du Master en Sciences et Technologies de l'Information et de la
Communication (MaSTIC)
 Université Libre de Bruxelles
 Av. F.D. Roosevelt, 50 CP 123  | 1050 Bruxelles
 http://homepages.ulb.ac.be/~svhoolan/
 http://twitter.com/#!/sethvanhooland
 http://mastic.ulb.ac.be
 0032 2 650 4765
 Office: DC11.102


Re: [CODE4LIB] The lie of the API

2013-11-29 Thread Péter Király
Hi,

I was happy to read this blog post, because it contain lots of very
important statements, but as one of the developers of Europeana API I
would like to mention some points.

The idea of content negotiation is nice, but it also adds some
additional burden for the API users. In some tools and programming
languages it is easy to modify HTTP headers, in others it is not that
trivial. For non tech people it is a burden. In an environment such as
Europeana not only tech people would like to see and check the non
HTML output, but it also has a meaning for metadata experts, marketing
people, ingestion team members and so on.

Europeana has a history, and even the API and the metadata model
behind has its own history. When we released the new API which
reflects the new metadata structure, it was evident, that we did not
want to break existing client side applications. So we had to
introduce versioning. With versioning we had the same choces as with
content type: we can make it transparent in the URL or use hypermedia
versioning via HTTP headers. This lead to the same problem as we had,
so we choosed URL approach.

Finally, when creating an API there are lots of different aspect we
should consider. Beside technological, scientific or aesthetic aspects
there are lots of other ones as well. We follow a way, which has good
and bad points, but as I see the same is true for Ruben's suggestions.
It is not true, that our way is driven by simle ignorance. We never
claimed, that we created RESTFul and pedantic API. We did a practical
one, and we keep improving it gradually, considering such a feedbacks
as this post.

Regards,
Péter


2013/11/29 Robert Sanderson azarot...@gmail.com:
 (posted in the comments on the blog and reposted here for further
 discussion, if interest)


 While I couldn't agree more with the post's starting point -- URIs identify
 (concepts) and use HTTP as your API -- I couldn't disagree more with the
 use content negotiation conclusion.

 I'm with Dan Cohen in his comment regarding using different URIs for
 different representations for several reasons below.

 It's harder to implement Content Negotiation than your own API, because you
 get to define your own API whereas you have to follow someone else's rules
 when you implement conneg.  You can't get your own API wrong.  I agree with
 Ruben that HTTP is better than rolling your own proprietary API, we
 disagree that conneg is the correct solution.  The choice is between conneg
 or regular HTTP, not conneg or a proprietary API.

 Secondly, you need to look at the HTTP headers and parse quite a complex
 structure to determine what is being requested.  You can't just put a file
 in the file system, unlike with separate URIs for distinct representations
 where it just works, instead you need server side processing.  This also
 makes it much harder to cache the responses, as the cache needs to
 determine whether or not the representation has changed -- the cache also
 needs to parse the headers rather than just comparing URI and content.  For
 large scale systems like DPLA and Europeana, caching is essential for
 quality of service.

 How do you find our which formats are supported by conneg? By reading the
 documentation. Which could just say add .json on the end. The Vary header
 tells you that negotiation in the format dimension is possible, just not
 what to do to actually get anything back. There isn't a way to find this
 out from HTTP automatically,so now you need to read both the site's docs
 AND the HTTP docs.  APIs can, on the other hand, do this.  Consider
 OAI-PMH's ListMetadataFormats and SRU's Explain response.

 Instead you can have a separate URI for each representation and link them
 with Link headers, or just a simple rule like add '.json' on the end. No
 need for complicated content negotiation at all.  Link headers can be added
 with a simple apache configuration rule, and as they're static are easy to
 cache. So the server side is easy, and the client side is trivial.
  Compared to being difficult at both ends with content negotiation.

 It can be useful to make statements about the different representations,
 and especially if you need to annotate the structure or content.  Or share
 it -- you can't email someone a link that includes the right Accept headers
 to send -- as in the post, you need to send them a command line like curl
 with -H.

 An experiment for fans of content negotiation: Have both .json and 302
 style conneg from your original URI to that .json file. Advertise both. See
 how many people do the conneg. If it's non-zero, I'll be extremely
 surprised.

 And a challenge: Even with libraries there's still complexity to figuring
 out how and what to serve. Find me sites that correctly implement * based
 fallbacks. Or even process q values. I'll bet I can find 10 that do content
 negotiation wrong, for every 1 that does it correctly.  I'll start:
 dx.doi.org touts its content negotiation for metadata, yet doesn't
 

Re: [CODE4LIB] The lie of the API

2013-11-29 Thread Simon Spero
Seth (and commenters) -

   The basic point is sound, but  there are some important issues that are
averted or are  elided  in the original article in order to make the
underlying point more clearly.

1:  It should be quite clear that there is no need to develop an API for
the sole purpose of generating an alternate representation of a [document]
in a form that is intended to be machine actionable as opposed to one that
is intended to be rendered for human consumption, and the referent  This is
precisely what the content negotiation mechanism was designed for.

2: It is less clear, but still reasonable, to use content negotiation to
treat content types for the same URI polysemously (having related,but
slightly different senses).  For example, the HTML rendering of a URI may
carry slightly  different propositional content than is carried in a set of
RDF assertions*.

3: For stative actions not related to content, a formally defined API is
required.

4: Since there is no intrinsic relationship between two objects with
different URIs, breaking the connection for items which are identical** may
require extra work to repair.

5: Cacheable content negotiation in HTTP has been around since the mid-late
nineties. It's retro-chic.

6: API keys that protect information extractable from non-api protected
sources were created to encourage people to learn how to implement
screen-scrapers and finite state transducers.

7: The commenter who brought up the issue of the same URI denoting
different FRBR entities must make a number of  metaphysical commitments.
Resulting models are FRBR-like, but are not pure FRBR.  If the 1:1
principle were real, any of these approaches would present insuperable
difficulties.

Simon

* Under a documentationalist interpretation, the propositional content must
be different, so allowing  at least some degree of polysemy is hard to
avoid.

** absolute identity cannot apply, but most forms of relative identity have
obvious interpretations.


On Fri, Nov 29, 2013 at 8:24 AM, Seth van Hooland svhoo...@ulb.ac.bewrote:

 Dear all,

 I guess some of you will be interested in the blogpost of my colleague and
 co-author Ruben regarding the misunderstandings on the use and abuse of
 APIs in a digital libraries context, including a description of both good
 and bad practices from Europeana, DPLA and the Cooper Hewitt museum:

 http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/

 Kind regards,

 Seth van Hooland
 Président du Master en Sciences et Technologies de l'Information et de la
 Communication (MaSTIC)
 Université Libre de Bruxelles
 Av. F.D. Roosevelt, 50 CP 123  | 1050 Bruxelles
 http://homepages.ulb.ac.be/~svhoolan/
 http://twitter.com/#!/sethvanhooland
 http://mastic.ulb.ac.be
 0032 2 650 4765
 Office: DC11.102