Re: [CODE4LIB] The lie of the API
On Sun, Dec 1, 2013 at 7:57 PM, Barnes, Hugh hugh.bar...@lincoln.ac.nzwrote: +1 to all of Richard's points here. Making something easier for you to develop is no justification for making it harder to consume or deviating from well supported standards. I just want to point out that as much as we all really, *really* want easy to consume and following the standards to be the same thingthey're not. Correct content negotiation is one of those things that often follows the phrase all they have to do..., which is always a red flag, as in Why give the user different URLs when *all they have to do is* Caching, json vs javascript vs jsonp, etc. all make this harder. If *all * *I have to do* is know that all the consumers of my data are going to do content negotiation right, and then I need to get deep into the guts of my caching mechanism, then set up an environment where it's all easy to test...well, it's harder. And don't tell me how lazy I am until you invent a day with a lot more hours. I'm sick of people telling me I'm lazy because I'm not pure. I expose APIs (which have their own share of problems, of course) because I want them to be *useful* and *used. * -Bill, apparently feeling a little bitter this morning - -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] The lie of the API
Hi Richard, On Sun, Dec 1, 2013 at 4:25 PM, Richard Wallis richard.wal...@dataliberate.com wrote: It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules Don't wish your implementation problems on the consumers of your data. There are [you would hope] far more of them than of you ;-) Content-negotiation is an already established mechanism - why invent a new, and different, one just for *your* data? I should have been clearer here that I was responding to the original blog post. I'm not advocating arbitrary APIs, but instead just to use link headers between the different representations. The advantages are that the caching issues (both browser and intermediate caches) go away as the content is static, you don't need to invent a way to find out which formats are available (eg no arbitrary content in a 300 response), and you can simply publish the representations as any other resource without server side logic to deal with conneg. The disadvantages are ... none. There's no invention of APIs, it's just following a simpler route within the HTTP spec. Put your self in the place of your consumer having to get their head around yet another site specific API pattern. As a consumer of my own data, I would rather do a simple GET on a URI than mess around constructing the correct Accept header. As to discovering then using the (currently implemented) URI returned from a content-negotiated call - The standard http libraries take care of that, like any other http redirects (301,303, etc) plus you are protected from any future backend server implementation changes. No they don't, as there's no way to know which representations are available via conneg, and hence no automated way to construct the Accept header. Rob
Re: [CODE4LIB] The lie of the API
On Sun, Dec 1, 2013 at 5:57 PM, Barnes, Hugh hugh.bar...@lincoln.ac.nzwrote: +1 to all of Richard's points here. Making something easier for you to develop is no justification for making it harder to consume or deviating from well supported standards. I'm not suggesting deviating from well supported standards, I'm suggesting choosing a different approach within the well supported standard that makes it easier for both consumer and producer. [Robert] You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. If we introduce languages into the negotiation, this won't scale. Sure, there's situations where the number of variants is so large that including them all would be a nuisance. The number of times this actually happens is (in my experience at least) vanishingly small. Again, I'm not suggesting an arbitrary API, I'm saying that there's easier ways to accomplish the 99% of cases than conneg. [Robert] This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing URI and content. Don't know caches intimately, but I don't see why that's algorithmically difficult. Just look at the Content-type of the response. Is it harder for caches to examine headers than content or URI? (That's an earnest, perhaps naïve, question.) If we are talking about caching on the client here (not caching proxies), I would think in most cases requests are issued with the same Accept-* headers, so caching will work as expected anyway. I think Joe already discussed this one, but there's an outstanding conneg caching bug in firefox and it took even Squid a long time to implement the content negotiation aware caching. Also note, much harder not impossible :) No Conneg: * Check if we have the URI. Done. O(1) as it's a hash. Conneg: * Check if we have the URI. Parse the Accept headers from the request. Check if they match the cached content and don't contain wildcards. O(quite a lot more than 1) [Robert] Link headers can be added with a simple apache configuration rule, and as they're static are easy to cache. So the server side is easy, and the client side is trivial. Hadn't heard of these. (They are on Wikipedia so they must be real.) What do they offer over HTML link elements populated from the Dublin Core Element Set? Nothing :) They're link elements in a header so you can use them in non HTML representations. My whatever it's worth . great topic, though, thanks Robert :) Welcome :) Rob
Re: [CODE4LIB] The lie of the API
On 12/2/13 10:50 AM, Robert Sanderson wrote: On Sun, Dec 1, 2013 at 4:25 PM, Richard Wallis richard.wal...@dataliberate.com wrote: As to discovering then using the (currently implemented) URI returned from a content-negotiated call - The standard http libraries take care of that, like any other http redirects (301,303, etc) plus you are protected from any future backend server implementation changes. No they don't, as there's no way to know which representations are available via conneg, and hence no automated way to construct the Accept header. To me this is the biggest issue with content negotiation for machine APIs. What you get may be influenced by the Accept headers you send, but without detailed knowledge of the particular system you are interacting with you can't predict what you'll actually get. Cheers, Simeon
Re: [CODE4LIB] The lie of the API
Yeah, I'm going to disagree a bit with the original post in this thread, and with Richard's contribution too. Or at least qualify it. My experience is that folks trying to be pure and avoid an API do _not_ make it easier for me to consume as a developer writing clients. It's just not true that one always leads to the other. The easiest API's I have to deal with are those where the developers really understand the use cases clients are likely to have, and really make API's that conveniently serve those use cases. The most difficult API's I have to deal with are those where the developers spent a lot of time thinking about very abstract and theoretical concerns of architectural purity, whether in terms of REST, linked data, HATEOS, or, god forbid, all of those and more at once (and then realizing that sometimes they seem to conflict) -- and neglected to think about actual use cases and making them smooth. Seriously, think about the most pleasant, efficient, and powerful API's you have used. (github's? Something else?). How many of them are 'pure' non-API API's, how many of them are actually API's? I'm going to call it an API even if it does what the original post says, I'm going to say API in the sense of how software is meant to deal with this -- in the base case, the so-called API can be screen scrape HTML, okay. I am going to agree that aligning the API with the user-visible web app as much as possible -- what the original post is saying you should always and only do -- does make sense. But slavish devotion to avoiding any API as distinct from the human web UI at all leads to theoretically pure but difficult to use API's. Sometimes the 'information architecture' that makes sense for humans differs from what makes sense for machine access. Sometimes the human UI needs lots of JS which complicates things. Even without this, an API which lets me choose representations based on different URI's instead of _only_ conneg (say, /widget/18.json instead of only /widget/18 with conneg) ends up being significantly easier to develop against and debug. Spend a bit of time understanding what people consider theoretically pure, sure, because it can give you more tools in your toolbox. But simply slavishly sticking to it does not, in my experience, result in a good 'developer experience' for your developer clients. And when you start realizing that different people from different schools have different ideas of what 'theoretically pure' looks like, when you start spending many hours going over HTTP RANGE 14 and just getting more confused -- realize that what matters in the end is being easy to use for your developers use cases, and just do it. Personally, I'd spend more time making sure i understand my developers use cases and getting feedback from developers, and less time on architecting castles in the sky that are theoretically pure. On 12/2/13 9:56 AM, Bill Dueber wrote: On Sun, Dec 1, 2013 at 7:57 PM, Barnes, Hugh hugh.bar...@lincoln.ac.nzwrote: +1 to all of Richard's points here. Making something easier for you to develop is no justification for making it harder to consume or deviating from well supported standards. I just want to point out that as much as we all really, *really* want easy to consume and following the standards to be the same thingthey're not. Correct content negotiation is one of those things that often follows the phrase all they have to do..., which is always a red flag, as in Why give the user different URLs when *all they have to do is* Caching, json vs javascript vs jsonp, etc. all make this harder. If *all * *I have to do* is know that all the consumers of my data are going to do content negotiation right, and then I need to get deep into the guts of my caching mechanism, then set up an environment where it's all easy to test...well, it's harder. And don't tell me how lazy I am until you invent a day with a lot more hours. I'm sick of people telling me I'm lazy because I'm not pure. I expose APIs (which have their own share of problems, of course) because I want them to be *useful* and *used. * -Bill, apparently feeling a little bitter this morning -
Re: [CODE4LIB] The lie of the API
Though I have some quibbles with Seth's post, I think it's worth drawing attention to his repeatedly calling out API keys as a very significant barrier to use, or at least entry. Most of the posts here have given little attention to the issue API keys present. I can say that I have quite often looked elsewhere or simply stopped pursuing my idea the moment I discovered an API key was mandatory. As for the presumed difficulty with implementing content negotiation (and, especially, caching on top), it seems that if you can implement an entire system to manage assignment of and access by API key, then I do not understand how content negotiation and caching are significantly harder to implement. In any event, APIs and content negotiation are not mutually exclusive. One should be able to use the HTTP URI to access multiple representations of the resource without recourse to a custom API. Yours, Kevin On 11/29/2013 02:44 PM, Robert Sanderson wrote: (posted in the comments on the blog and reposted here for further discussion, if interest) While I couldn't agree more with the post's starting point -- URIs identify (concepts) and use HTTP as your API -- I couldn't disagree more with the use content negotiation conclusion. I'm with Dan Cohen in his comment regarding using different URIs for different representations for several reasons below. It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules when you implement conneg. You can't get your own API wrong. I agree with Ruben that HTTP is better than rolling your own proprietary API, we disagree that conneg is the correct solution. The choice is between conneg or regular HTTP, not conneg or a proprietary API. Secondly, you need to look at the HTTP headers and parse quite a complex structure to determine what is being requested. You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing URI and content. For large scale systems like DPLA and Europeana, caching is essential for quality of service. How do you find our which formats are supported by conneg? By reading the documentation. Which could just say add .json on the end. The Vary header tells you that negotiation in the format dimension is possible, just not what to do to actually get anything back. There isn't a way to find this out from HTTP automatically,so now you need to read both the site's docs AND the HTTP docs. APIs can, on the other hand, do this. Consider OAI-PMH's ListMetadataFormats and SRU's Explain response. Instead you can have a separate URI for each representation and link them with Link headers, or just a simple rule like add '.json' on the end. No need for complicated content negotiation at all. Link headers can be added with a simple apache configuration rule, and as they're static are easy to cache. So the server side is easy, and the client side is trivial. Compared to being difficult at both ends with content negotiation. It can be useful to make statements about the different representations, and especially if you need to annotate the structure or content. Or share it -- you can't email someone a link that includes the right Accept headers to send -- as in the post, you need to send them a command line like curl with -H. An experiment for fans of content negotiation: Have both .json and 302 style conneg from your original URI to that .json file. Advertise both. See how many people do the conneg. If it's non-zero, I'll be extremely surprised. And a challenge: Even with libraries there's still complexity to figuring out how and what to serve. Find me sites that correctly implement * based fallbacks. Or even process q values. I'll bet I can find 10 that do content negotiation wrong, for every 1 that does it correctly. I'll start: dx.doi.org touts its content negotiation for metadata, yet doesn't implement q values or *s. You have to go to the documentation to figure out what Accept headers it will do string equality tests against. Rob On Fri, Nov 29, 2013 at 6:24 AM, Seth van Hooland svhoo...@ulb.ac.be wrote: Dear all, I guess some of you will be interested in the blogpost of my colleague and co-author Ruben regarding the misunderstandings on the use and abuse of APIs in a digital libraries context, including a description of both good and bad practices from Europeana, DPLA and the Cooper Hewitt museum: http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/ Kind regards, Seth van Hooland Président du Master en Sciences et Technologies de l'Information et de la Communication (MaSTIC) Université Libre de Bruxelles Av. F.D.
Re: [CODE4LIB] The lie of the API
I'm not going to defend API keys, but not all APIs are open or free. You need to have *some* way to track usage. There may be alternative ways to implement that, but you can't just hand wave away the rather large use case for API keys. -Ross. On Mon, Dec 2, 2013 at 12:15 PM, Kevin Ford k...@3windmills.com wrote: Though I have some quibbles with Seth's post, I think it's worth drawing attention to his repeatedly calling out API keys as a very significant barrier to use, or at least entry. Most of the posts here have given little attention to the issue API keys present. I can say that I have quite often looked elsewhere or simply stopped pursuing my idea the moment I discovered an API key was mandatory. As for the presumed difficulty with implementing content negotiation (and, especially, caching on top), it seems that if you can implement an entire system to manage assignment of and access by API key, then I do not understand how content negotiation and caching are significantly harder to implement. In any event, APIs and content negotiation are not mutually exclusive. One should be able to use the HTTP URI to access multiple representations of the resource without recourse to a custom API. Yours, Kevin On 11/29/2013 02:44 PM, Robert Sanderson wrote: (posted in the comments on the blog and reposted here for further discussion, if interest) While I couldn't agree more with the post's starting point -- URIs identify (concepts) and use HTTP as your API -- I couldn't disagree more with the use content negotiation conclusion. I'm with Dan Cohen in his comment regarding using different URIs for different representations for several reasons below. It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules when you implement conneg. You can't get your own API wrong. I agree with Ruben that HTTP is better than rolling your own proprietary API, we disagree that conneg is the correct solution. The choice is between conneg or regular HTTP, not conneg or a proprietary API. Secondly, you need to look at the HTTP headers and parse quite a complex structure to determine what is being requested. You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing URI and content. For large scale systems like DPLA and Europeana, caching is essential for quality of service. How do you find our which formats are supported by conneg? By reading the documentation. Which could just say add .json on the end. The Vary header tells you that negotiation in the format dimension is possible, just not what to do to actually get anything back. There isn't a way to find this out from HTTP automatically,so now you need to read both the site's docs AND the HTTP docs. APIs can, on the other hand, do this. Consider OAI-PMH's ListMetadataFormats and SRU's Explain response. Instead you can have a separate URI for each representation and link them with Link headers, or just a simple rule like add '.json' on the end. No need for complicated content negotiation at all. Link headers can be added with a simple apache configuration rule, and as they're static are easy to cache. So the server side is easy, and the client side is trivial. Compared to being difficult at both ends with content negotiation. It can be useful to make statements about the different representations, and especially if you need to annotate the structure or content. Or share it -- you can't email someone a link that includes the right Accept headers to send -- as in the post, you need to send them a command line like curl with -H. An experiment for fans of content negotiation: Have both .json and 302 style conneg from your original URI to that .json file. Advertise both. See how many people do the conneg. If it's non-zero, I'll be extremely surprised. And a challenge: Even with libraries there's still complexity to figuring out how and what to serve. Find me sites that correctly implement * based fallbacks. Or even process q values. I'll bet I can find 10 that do content negotiation wrong, for every 1 that does it correctly. I'll start: dx.doi.org touts its content negotiation for metadata, yet doesn't implement q values or *s. You have to go to the documentation to figure out what Accept headers it will do string equality tests against. Rob On Fri, Nov 29, 2013 at 6:24 AM, Seth van Hooland svhoo...@ulb.ac.be wrote: Dear all, I guess some of you will be interested in the blogpost of my colleague and co-author Ruben regarding the misunderstandings on the use and
Re: [CODE4LIB] The lie of the API
There are plenty of non-free API's, that need some kind of access control. A different side discussion is what forms of access control are the least barrier to developers while still being secure (a lot of services mess this up in both directions!). However, there are also some free API's whcih still require API keys, perhaps because the owners want to track usage or throttle usage or what have you. Sometimes you need to do that too, and you need to restrict access, so be it. But it is probably worth recognizing that you are sometimes adding barriers to succesful client development here -- it seems like a trivial barrier from the perspective of the developers of the service, because they use the service so often. But to a client developer working with a dozen different API's, the extra burden to get and deal with the API key and the access control mechanism can be non-trivial. I think the best compromise is what Google ends up doing with many of their APIs. Allow access without an API key, but with a fairly minimal number of accesses-per-time-period allowed (couple hundred a day, is what I think google often does). This allows the developer to evaluate the api, explore/debug the api in the browser, and write automated tests against the api, without worrying about api keys. But still requires an api key for 'real' use, so the host can do what tracking or throttling they want. Jonathan On 12/2/13 12:18 PM, Ross Singer wrote: I'm not going to defend API keys, but not all APIs are open or free. You need to have *some* way to track usage. There may be alternative ways to implement that, but you can't just hand wave away the rather large use case for API keys. -Ross. On Mon, Dec 2, 2013 at 12:15 PM, Kevin Ford k...@3windmills.com wrote: Though I have some quibbles with Seth's post, I think it's worth drawing attention to his repeatedly calling out API keys as a very significant barrier to use, or at least entry. Most of the posts here have given little attention to the issue API keys present. I can say that I have quite often looked elsewhere or simply stopped pursuing my idea the moment I discovered an API key was mandatory. As for the presumed difficulty with implementing content negotiation (and, especially, caching on top), it seems that if you can implement an entire system to manage assignment of and access by API key, then I do not understand how content negotiation and caching are significantly harder to implement. In any event, APIs and content negotiation are not mutually exclusive. One should be able to use the HTTP URI to access multiple representations of the resource without recourse to a custom API. Yours, Kevin On 11/29/2013 02:44 PM, Robert Sanderson wrote: (posted in the comments on the blog and reposted here for further discussion, if interest) While I couldn't agree more with the post's starting point -- URIs identify (concepts) and use HTTP as your API -- I couldn't disagree more with the use content negotiation conclusion. I'm with Dan Cohen in his comment regarding using different URIs for different representations for several reasons below. It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules when you implement conneg. You can't get your own API wrong. I agree with Ruben that HTTP is better than rolling your own proprietary API, we disagree that conneg is the correct solution. The choice is between conneg or regular HTTP, not conneg or a proprietary API. Secondly, you need to look at the HTTP headers and parse quite a complex structure to determine what is being requested. You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing URI and content. For large scale systems like DPLA and Europeana, caching is essential for quality of service. How do you find our which formats are supported by conneg? By reading the documentation. Which could just say add .json on the end. The Vary header tells you that negotiation in the format dimension is possible, just not what to do to actually get anything back. There isn't a way to find this out from HTTP automatically,so now you need to read both the site's docs AND the HTTP docs. APIs can, on the other hand, do this. Consider OAI-PMH's ListMetadataFormats and SRU's Explain response. Instead you can have a separate URI for each representation and link them with Link headers, or just a simple rule like add '.json' on the end. No need for complicated content negotiation at all. Link headers can be added with a simple apache configuration rule, and as they're static are easy to cache. So the
Re: [CODE4LIB] The lie of the API
I think the best compromise is what Google ends up doing with many of their APIs. Allow access without an API key, but with a fairly minimal number of accesses-per-time-period allowed (couple hundred a day, is what I think google often does). -- Agreed. I certainly didn't mean to suggest that there were not legitimate use cases for API keys. That said, my gut (plus experience sitting in multiple meetings during which the need for an access mechanism landed on the table as a primary requirement) says people believe they need an API key before alternatives have been fully considered and even before there is an actual, defined need for one. Server logs often reveal most types of usage statistics service operators are interested in and there are ways to throttle traffic at the caching level (the latter can be a little tricky to implement, however). Yours, Kevin On 12/02/2013 12:38 PM, Jonathan Rochkind wrote: There are plenty of non-free API's, that need some kind of access control. A different side discussion is what forms of access control are the least barrier to developers while still being secure (a lot of services mess this up in both directions!). However, there are also some free API's whcih still require API keys, perhaps because the owners want to track usage or throttle usage or what have you. Sometimes you need to do that too, and you need to restrict access, so be it. But it is probably worth recognizing that you are sometimes adding barriers to succesful client development here -- it seems like a trivial barrier from the perspective of the developers of the service, because they use the service so often. But to a client developer working with a dozen different API's, the extra burden to get and deal with the API key and the access control mechanism can be non-trivial. I think the best compromise is what Google ends up doing with many of their APIs. Allow access without an API key, but with a fairly minimal number of accesses-per-time-period allowed (couple hundred a day, is what I think google often does). This allows the developer to evaluate the api, explore/debug the api in the browser, and write automated tests against the api, without worrying about api keys. But still requires an api key for 'real' use, so the host can do what tracking or throttling they want. Jonathan On 12/2/13 12:18 PM, Ross Singer wrote: I'm not going to defend API keys, but not all APIs are open or free. You need to have *some* way to track usage. There may be alternative ways to implement that, but you can't just hand wave away the rather large use case for API keys. -Ross. On Mon, Dec 2, 2013 at 12:15 PM, Kevin Ford k...@3windmills.com wrote: Though I have some quibbles with Seth's post, I think it's worth drawing attention to his repeatedly calling out API keys as a very significant barrier to use, or at least entry. Most of the posts here have given little attention to the issue API keys present. I can say that I have quite often looked elsewhere or simply stopped pursuing my idea the moment I discovered an API key was mandatory. As for the presumed difficulty with implementing content negotiation (and, especially, caching on top), it seems that if you can implement an entire system to manage assignment of and access by API key, then I do not understand how content negotiation and caching are significantly harder to implement. In any event, APIs and content negotiation are not mutually exclusive. One should be able to use the HTTP URI to access multiple representations of the resource without recourse to a custom API. Yours, Kevin On 11/29/2013 02:44 PM, Robert Sanderson wrote: (posted in the comments on the blog and reposted here for further discussion, if interest) While I couldn't agree more with the post's starting point -- URIs identify (concepts) and use HTTP as your API -- I couldn't disagree more with the use content negotiation conclusion. I'm with Dan Cohen in his comment regarding using different URIs for different representations for several reasons below. It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules when you implement conneg. You can't get your own API wrong. I agree with Ruben that HTTP is better than rolling your own proprietary API, we disagree that conneg is the correct solution. The choice is between conneg or regular HTTP, not conneg or a proprietary API. Secondly, you need to look at the HTTP headers and parse quite a complex structure to determine what is being requested. You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing
Re: [CODE4LIB] The lie of the API
To be (more) controversial... If it's okay to require headers, why can't API keys go in a header rather than the URL. Then it's just the same as content negotiation, it seems to me. You send a header and get a different response from the same URI. Rob On Mon, Dec 2, 2013 at 10:57 AM, Edward Summers e...@pobox.com wrote: On Dec 3, 2013, at 4:18 AM, Ross Singer rossfsin...@gmail.com wrote: I'm not going to defend API keys, but not all APIs are open or free. You need to have *some* way to track usage. A key (haha) thing that keys also provide is an opportunity to have a conversation with the user of your api: who are they, how could you get in touch with them, what are they doing with the API, what would they like to do with the API, what doesn’t work? These questions are difficult to ask if they are just a IP address in your access log. //Ed
Re: [CODE4LIB] The lie of the API
I do frequently see API keys in header, it is a frequent pattern. Anything that requires things in the header, in my experience makes the API more 'expensive' to develop against. I'm not sure it is okay to require headers. Which is why I suggested allowing format specification in the URL, not just conneg headers. And is also, actually, why I expressed admiration for google's pattern of allowing X requests a day without an api key. Both things allow you to play with the api in a browser without headers. If you are requiring a cryptographic signature (ala HMAC) for your access control, you can't feasibly play with it in a browser anyway, it doesn't matter whether it's supplied in headers or query params. And (inconvenient) HMAC probably is the only actually secure way to do api access control, depending on what level of security is called for. On 12/2/13 1:03 PM, Robert Sanderson wrote: To be (more) controversial... If it's okay to require headers, why can't API keys go in a header rather than the URL. Then it's just the same as content negotiation, it seems to me. You send a header and get a different response from the same URI. Rob On Mon, Dec 2, 2013 at 10:57 AM, Edward Summers e...@pobox.com wrote: On Dec 3, 2013, at 4:18 AM, Ross Singer rossfsin...@gmail.com wrote: I'm not going to defend API keys, but not all APIs are open or free. You need to have *some* way to track usage. A key (haha) thing that keys also provide is an opportunity to have a conversation with the user of your api: who are they, how could you get in touch with them, what are they doing with the API, what would they like to do with the API, what doesn’t work? These questions are difficult to ask if they are just a IP address in your access log. //Ed
Re: [CODE4LIB] The lie of the API
Amazon Web Services (which is probably the most heavily used API on the Web) use HTTP headers for authentication. But I guess developers typically use software libraries to access AWS rather than making the HTTP calls directly. //Ed
Re: [CODE4LIB] The lie of the API
A key (haha) thing that keys also provide is an opportunity to have a conversation with the user of your api: who are they, how could you get in touch with them, what are they doing with the API, what would they like to do with the API, what doesn’t work? These questions are difficult to ask if they are just a IP address in your access log. -- True, but, again, there are other ways to go about this. I've baulked at doing just this in the past because it reveals the raw and primary purpose behind an API key: to track individual user usage/access. I would feel a little awkward writing (and receiving, incidentally) a message that began: -- Hello, I saw you using our service. What are you doing with our data? Cordially, Data service team --- And, if you cringe a little at the ramifications of the above, then why do you need user-specific granularity? (That's really not meant to be a rhetorical question - I would genuinely be interested in whether my notions of open and free are outmoded and based too much in a theoretical purity that unnecessary tracking is a violation of privacy). Unless the API key exists to control specific, user-level access precisely because this is a facet of the underlying service, I feel somewhere in all of this the service has violated, in some way, the notion that it is open and/or free, assuming it has billed itself as such. Otherwise, it's free and open as in Google or Facebook. All that said, I think a data service can smooth things over greatly by not insisting on a developer signing a EULA (which is essentially what happens when one requests an API key) before even trying the service or desiring the most basic of data access. There are middle ground solutions. Yours, Kevin On 12/02/2013 12:57 PM, Edward Summers wrote: On Dec 3, 2013, at 4:18 AM, Ross Singer rossfsin...@gmail.com wrote: I'm not going to defend API keys, but not all APIs are open or free. You need to have *some* way to track usage. A key (haha) thing that keys also provide is an opportunity to have a conversation with the user of your api: who are they, how could you get in touch with them, what are they doing with the API, what would they like to do with the API, what doesn’t work? These questions are difficult to ask if they are just a IP address in your access log. //Ed
Re: [CODE4LIB] The lie of the API
umm... it's called HTTP-AUTH, and if you really want to be cool, use an X.509 client cert for authorization (see geoserver as an example that works very cleanly - http://docs.geoserver.org/latest/en/user/security/tutorials/cert/index.html; the freebxml registry-repository also uses X.509 based authentication in a reasonably clean manner) Robert Sanderson wrote: To be (more) controversial... If it's okay to require headers, why can't API keys go in a header rather than the URL. Then it's just the same as content negotiation, it seems to me. You send a header and get a different response from the same URI. Rob On Mon, Dec 2, 2013 at 10:57 AM, Edward Summers e...@pobox.com wrote: On Dec 3, 2013, at 4:18 AM, Ross Singer rossfsin...@gmail.com wrote: I'm not going to defend API keys, but not all APIs are open or free. You need to have *some* way to track usage. A key (haha) thing that keys also provide is an opportunity to have a conversation with the user of your api: who are they, how could you get in touch with them, what are they doing with the API, what would they like to do with the API, what doesn’t work? These questions are difficult to ask if they are just a IP address in your access log. //Ed
Re: [CODE4LIB] The lie of the API
On Dec 2, 2013, at 1:25 PM, Kevin Ford wrote: A key (haha) thing that keys also provide is an opportunity to have a conversation with the user of your api: who are they, how could you get in touch with them, what are they doing with the API, what would they like to do with the API, what doesn’t work? These questions are difficult to ask if they are just a IP address in your access log. -- True, but, again, there are other ways to go about this. I've baulked at doing just this in the past because it reveals the raw and primary purpose behind an API key: to track individual user usage/access. I would feel a little awkward writing (and receiving, incidentally) a message that began: -- Hello, I saw you using our service. What are you doing with our data? Cordially, Data service team -- It's better than posting to a website: We can't justify keeping this API maintained / available, because we have no idea who's using it, or what they're using it for. Or: We've had to shut down the API because we'd had people abusing the API and we can't easily single them out as it's not just coming from a single IP range. We don't require API keys here, but we *do* send out messages to our designated community every couple of years with: If you use our APIs, please send a letter of support that we can include in our upcoming Senior Review. (Senior Review is NASA's peer-review of operating projects, where they bring in outsiders to judge if it's justifiable to continue funding them, and if so, at what level) Personally, I like the idea of allowing limited use without a key (be it number of accesses per day, number of concurrent accesses, or some other rate limiting), but as someone who has been operating APIs for years and is *not* *allowed* to track users, I've seen quite a few times when it would've made my life so much easier. And, if you cringe a little at the ramifications of the above, then why do you need user-specific granularity? (That's really not meant to be a rhetorical question - I would genuinely be interested in whether my notions of open and free are outmoded and based too much in a theoretical purity that unnecessary tracking is a violation of privacy). You're assuming that you're actually correlating API calls to the users ... it may just be an authentication system and nothing past that. Unless the API key exists to control specific, user-level access precisely because this is a facet of the underlying service, I feel somewhere in all of this the service has violated, in some way, the notion that it is open and/or free, assuming it has billed itself as such. Otherwise, it's free and open as in Google or Facebook. You're also assuming that we've claimed that our services are 'open'. (mine are, but I know of plenty of them that have to deal with authorization, as they manage embargoed or otherwise restricted items). Of course, you can also set up some sort of 'guest' privileges for non-authenticated users so they just wouldn't see the restricted content. All that said, I think a data service can smooth things over greatly by not insisting on a developer signing a EULA (which is essentially what happens when one requests an API key) before even trying the service or desiring the most basic of data access. There are middle ground solutions. I do have problems with EULAs ... one in that we have to get things approved by our legal department, second in that they're often written completely one-sided and third in that they're often written assuming personal use. Twitter and Facebook had to make available alternate EULAs so that governments could use them ... because you can't hold the person who signed up for the account responsible for it. (and they don't want it 'owned' by that person should they be fired, etc.) ... but sometimes they're less restrictive ... more TOS than EULA. Without it, you've got absolutely no sort of SLA ... if they want to take down their API, or block you, you've got no recourse at all. -Joe
Re: [CODE4LIB] The lie of the API
Environment Canterbury has a click-through screen making you accept their terms and conditions before you get access to the API, and they use that as an opportunity to ask some questions about your intended use. Then once you've answered those you get direct access to the API as beautiful plain XML. (Okay, XML which possibly overuses attributes to carry data instead of tags, but I eventually figured out how to make my server's version of PHP happy with that.) It's glorious. It made me so happy that I went back to their click-through screen to give them some more information about what I was doing. When I had to try and navigate Twitter's API and authentication models, however... Well, I absolutely understand the need for it, but it'll be a long time before I ever try that again. Deborah -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward Summers Sent: Tuesday, 3 December 2013 6:57 a.m. To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] The lie of the API On Dec 3, 2013, at 4:18 AM, Ross Singer rossfsin...@gmail.com wrote: I'm not going to defend API keys, but not all APIs are open or free. You need to have *some* way to track usage. A key (haha) thing that keys also provide is an opportunity to have a conversation with the user of your api: who are they, how could you get in touch with them, what are they doing with the API, what would they like to do with the API, what doesn't work? These questions are difficult to ask if they are just a IP address in your access log. //Ed P Please consider the environment before you print this email. The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use, distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender by return e-mail or telephone and then delete this e-mail together with all attachments from your system.
Re: [CODE4LIB] The lie of the API
I'm confused about the supposed distinction between content negotiation and explicit content request in a URL. The reason I'm confused is that the response to content negotiation is supposed to be a content location header with a URL that is guaranteed to return the negotiated content. In other words, there *must* be a form of the URL that bypasses content negotiation. If you can do content negotiation, then you should have a URL form that doesn't require content negotiation. Ralph From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Robert Sanderson azarot...@gmail.com Sent: Friday, November 29, 2013 2:44 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: The lie of the API (posted in the comments on the blog and reposted here for further discussion, if interest) While I couldn't agree more with the post's starting point -- URIs identify (concepts) and use HTTP as your API -- I couldn't disagree more with the use content negotiation conclusion. I'm with Dan Cohen in his comment regarding using different URIs for different representations for several reasons below. It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules when you implement conneg. You can't get your own API wrong. I agree with Ruben that HTTP is better than rolling your own proprietary API, we disagree that conneg is the correct solution. The choice is between conneg or regular HTTP, not conneg or a proprietary API. Secondly, you need to look at the HTTP headers and parse quite a complex structure to determine what is being requested. You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing URI and content. For large scale systems like DPLA and Europeana, caching is essential for quality of service. How do you find our which formats are supported by conneg? By reading the documentation. Which could just say add .json on the end. The Vary header tells you that negotiation in the format dimension is possible, just not what to do to actually get anything back. There isn't a way to find this out from HTTP automatically,so now you need to read both the site's docs AND the HTTP docs. APIs can, on the other hand, do this. Consider OAI-PMH's ListMetadataFormats and SRU's Explain response. Instead you can have a separate URI for each representation and link them with Link headers, or just a simple rule like add '.json' on the end. No need for complicated content negotiation at all. Link headers can be added with a simple apache configuration rule, and as they're static are easy to cache. So the server side is easy, and the client side is trivial. Compared to being difficult at both ends with content negotiation. It can be useful to make statements about the different representations, and especially if you need to annotate the structure or content. Or share it -- you can't email someone a link that includes the right Accept headers to send -- as in the post, you need to send them a command line like curl with -H. An experiment for fans of content negotiation: Have both .json and 302 style conneg from your original URI to that .json file. Advertise both. See how many people do the conneg. If it's non-zero, I'll be extremely surprised. And a challenge: Even with libraries there's still complexity to figuring out how and what to serve. Find me sites that correctly implement * based fallbacks. Or even process q values. I'll bet I can find 10 that do content negotiation wrong, for every 1 that does it correctly. I'll start: dx.doi.org touts its content negotiation for metadata, yet doesn't implement q values or *s. You have to go to the documentation to figure out what Accept headers it will do string equality tests against. Rob On Fri, Nov 29, 2013 at 6:24 AM, Seth van Hooland svhoo...@ulb.ac.be wrote: Dear all, I guess some of you will be interested in the blogpost of my colleague and co-author Ruben regarding the misunderstandings on the use and abuse of APIs in a digital libraries context, including a description of both good and bad practices from Europeana, DPLA and the Cooper Hewitt museum: http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/ Kind regards, Seth van Hooland Président du Master en Sciences et Technologies de l'Information et de la Communication (MaSTIC) Université Libre de Bruxelles Av. F.D. Roosevelt, 50 CP 123 | 1050 Bruxelles http://homepages.ulb.ac.be/~svhoolan/ http://twitter.com/#!/sethvanhooland http://mastic.ulb.ac.be 0032 2 650 4765 Office: DC11.102
Re: [CODE4LIB] The lie of the API
It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules Don't wish your implementation problems on the consumers of your data. There are [you would hope] far more of them than of you ;-) Content-negotiation is an already established mechanism - why invent a new, and different, one just for *your* data? Put your self in the place of your consumer having to get their head around yet another site specific API pattern. As to discovering then using the (currently implemented) URI returned from a content-negotiated call - The standard http libraries take care of that, like any other http redirects (301,303, etc) plus you are protected from any future backend server implementation changes. ~Richard On 1 December 2013 20:51, LeVan,Ralph le...@oclc.org wrote: I'm confused about the supposed distinction between content negotiation and explicit content request in a URL. The reason I'm confused is that the response to content negotiation is supposed to be a content location header with a URL that is guaranteed to return the negotiated content. In other words, there *must* be a form of the URL that bypasses content negotiation. If you can do content negotiation, then you should have a URL form that doesn't require content negotiation. Ralph From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Robert Sanderson azarot...@gmail.com Sent: Friday, November 29, 2013 2:44 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: The lie of the API (posted in the comments on the blog and reposted here for further discussion, if interest) While I couldn't agree more with the post's starting point -- URIs identify (concepts) and use HTTP as your API -- I couldn't disagree more with the use content negotiation conclusion. I'm with Dan Cohen in his comment regarding using different URIs for different representations for several reasons below. It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules when you implement conneg. You can't get your own API wrong. I agree with Ruben that HTTP is better than rolling your own proprietary API, we disagree that conneg is the correct solution. The choice is between conneg or regular HTTP, not conneg or a proprietary API. Secondly, you need to look at the HTTP headers and parse quite a complex structure to determine what is being requested. You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing URI and content. For large scale systems like DPLA and Europeana, caching is essential for quality of service. How do you find our which formats are supported by conneg? By reading the documentation. Which could just say add .json on the end. The Vary header tells you that negotiation in the format dimension is possible, just not what to do to actually get anything back. There isn't a way to find this out from HTTP automatically,so now you need to read both the site's docs AND the HTTP docs. APIs can, on the other hand, do this. Consider OAI-PMH's ListMetadataFormats and SRU's Explain response. Instead you can have a separate URI for each representation and link them with Link headers, or just a simple rule like add '.json' on the end. No need for complicated content negotiation at all. Link headers can be added with a simple apache configuration rule, and as they're static are easy to cache. So the server side is easy, and the client side is trivial. Compared to being difficult at both ends with content negotiation. It can be useful to make statements about the different representations, and especially if you need to annotate the structure or content. Or share it -- you can't email someone a link that includes the right Accept headers to send -- as in the post, you need to send them a command line like curl with -H. An experiment for fans of content negotiation: Have both .json and 302 style conneg from your original URI to that .json file. Advertise both. See how many people do the conneg. If it's non-zero, I'll be extremely surprised. And a challenge: Even with libraries there's still complexity to figuring out how and what to serve. Find me sites that correctly implement * based fallbacks. Or even process q values. I'll bet I can find 10 that do content negotiation wrong, for every 1 that does it correctly. I'll start: dx.doi.org touts its content negotiation for metadata, yet doesn't implement q values or *s. You have to go to the documentation to figure out what
Re: [CODE4LIB] The lie of the API
On Dec 1, 2013, at 3:51 PM, LeVan,Ralph wrote: I'm confused about the supposed distinction between content negotiation and explicit content request in a URL. The reason I'm confused is that the response to content negotiation is supposed to be a content location header with a URL that is guaranteed to return the negotiated content. In other words, there *must* be a form of the URL that bypasses content negotiation. If you can do content negotiation, then you should have a URL form that doesn't require content negotiation. There are three types of content negotiation discussed in HTTP/1.1. The one that most gets used is 'transparent negotiation' which results in there being different content served under a single URL. Transparent negotiation schemes do *not* redirect to a new URL to allow the cache or browser to identify the specific content returned. (this would require an extra round trip, as you'd have to send a Location: header to redirect, then have the browser request the new page) So that you don't screw up web proxies, you have to specify the 'Vary' header to tell which parameters you consider significant so that it knows what is or isn't cacheable. So if you might serve different content based on the Accept and Accept-Encoding would return: Vary: Accept, Accept-Encoding (Including 'User Agent' is problematic because of some browsers that pack in every module + the version in there, making there be so many permutations that many proxies will refuse to cache it) -Joe (who has been managing web servers since HTTP/0.9, and gets annoyed when I have to explain to our security folks each year why I don't reject pre-HTTP/1.1 requests or follow the rest of the CIS benchmark recommendations that cause our web services to fail horribly)
Re: [CODE4LIB] The lie of the API
+1 to all of Richard's points here. Making something easier for you to develop is no justification for making it harder to consume or deviating from well supported standards. [Robert] You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. If we introduce languages into the negotiation, this won't scale. [Robert] This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing URI and content. Don't know caches intimately, but I don't see why that's algorithmically difficult. Just look at the Content-type of the response. Is it harder for caches to examine headers than content or URI? (That's an earnest, perhaps naïve, question.) If we are talking about caching on the client here (not caching proxies), I would think in most cases requests are issued with the same Accept-* headers, so caching will work as expected anyway. [Robert] Link headers can be added with a simple apache configuration rule, and as they're static are easy to cache. So the server side is easy, and the client side is trivial. Hadn't heard of these. (They are on Wikipedia so they must be real.) What do they offer over HTML link elements populated from the Dublin Core Element Set? --- My ideal setup would be to maintain a canonical URL that always serves the clients' flavour of representation (format/language), which could vary, but points to other representations (and versions for that matter) at separate URLs through a mechanism like HTML link elements. My whatever it's worth . great topic, though, thanks Robert :) Cheers Hugh Barnes Digital Access Coordinator Library, Teaching and Learning Lincoln University Christchurch New Zealand p +64 3 423 0357 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Richard Wallis Sent: Monday, 2 December 2013 12:26 p.m. To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] The lie of the API It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules Don't wish your implementation problems on the consumers of your data. There are [you would hope] far more of them than of you ;-) Content-negotiation is an already established mechanism - why invent a new, and different, one just for *your* data? Put your self in the place of your consumer having to get their head around yet another site specific API pattern. As to discovering then using the (currently implemented) URI returned from a content-negotiated call - The standard http libraries take care of that, like any other http redirects (301,303, etc) plus you are protected from any future backend server implementation changes. ~Richard On 1 December 2013 20:51, LeVan,Ralph le...@oclc.org wrote: I'm confused about the supposed distinction between content negotiation and explicit content request in a URL. The reason I'm confused is that the response to content negotiation is supposed to be a content location header with a URL that is guaranteed to return the negotiated content. In other words, there *must* be a form of the URL that bypasses content negotiation. If you can do content negotiation, then you should have a URL form that doesn't require content negotiation. Ralph From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Robert Sanderson azarot...@gmail.com Sent: Friday, November 29, 2013 2:44 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: The lie of the API (posted in the comments on the blog and reposted here for further discussion, if interest) While I couldn't agree more with the post's starting point -- URIs identify (concepts) and use HTTP as your API -- I couldn't disagree more with the use content negotiation conclusion. I'm with Dan Cohen in his comment regarding using different URIs for different representations for several reasons below. It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules when you implement conneg. You can't get your own API wrong. I agree with Ruben that HTTP is better than rolling your own proprietary API, we disagree that conneg is the correct solution. The choice is between conneg or regular HTTP, not conneg or a proprietary API. Secondly, you need to look at the HTTP headers and parse quite a complex structure to determine what is being requested. You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. This also makes it much harder to cache the responses
Re: [CODE4LIB] The lie of the API
Returning a content location header does not require a redirect. You can return the negotiated content with the first response than still tell the client how it could have asked for that same content without negotiation. That's what the content location header means in the absence of a redirect status code. Ralph From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Joe Hourcle onei...@grace.nascom.nasa.gov Sent: Sunday, December 01, 2013 6:39 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: The lie of the API On Dec 1, 2013, at 3:51 PM, LeVan,Ralph wrote: I'm confused about the supposed distinction between content negotiation and explicit content request in a URL. The reason I'm confused is that the response to content negotiation is supposed to be a content location header with a URL that is guaranteed to return the negotiated content. In other words, there *must* be a form of the URL that bypasses content negotiation. If you can do content negotiation, then you should have a URL form that doesn't require content negotiation. There are three types of content negotiation discussed in HTTP/1.1. The one that most gets used is 'transparent negotiation' which results in there being different content served under a single URL. Transparent negotiation schemes do *not* redirect to a new URL to allow the cache or browser to identify the specific content returned. (this would require an extra round trip, as you'd have to send a Location: header to redirect, then have the browser request the new page) So that you don't screw up web proxies, you have to specify the 'Vary' header to tell which parameters you consider significant so that it knows what is or isn't cacheable. So if you might serve different content based on the Accept and Accept-Encoding would return: Vary: Accept, Accept-Encoding (Including 'User Agent' is problematic because of some browsers that pack in every module + the version in there, making there be so many permutations that many proxies will refuse to cache it) -Joe (who has been managing web servers since HTTP/0.9, and gets annoyed when I have to explain to our security folks each year why I don't reject pre-HTTP/1.1 requests or follow the rest of the CIS benchmark recommendations that cause our web services to fail horribly)
Re: [CODE4LIB] The lie of the API
On Dec 1, 2013, at 7:57 PM, Barnes, Hugh wrote: +1 to all of Richard's points here. Making something easier for you to develop is no justification for making it harder to consume or deviating from well supported standards. [Robert] You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. If we introduce languages into the negotiation, this won't scale. It depends on what you qualify as 'scaling'. You can configure Apache and some other servers so that you pre-generate files such as : index.en.html index.de.html index.es.html index.fr.html ... It's even the default for some distributions. Then, depending on what the Accept-Language header is sent, the server returns the appropriate response. The only issue is that the server assumes that the 'quality' of all of the translations are equivalent. You know that 'q=0.9' stuff? There's actually a scale in RFC 2295, that equates the different qualities to how much content is lost in that particular version: Servers should use the following table a guide when assigning source quality values: 1.000 perfect representation 0.900 threshold of noticeable loss of quality 0.800 noticeable, but acceptable quality reduction 0.500 barely acceptable quality 0.300 severely degraded quality 0.000 completely degraded quality [Robert] This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing URI and content. Don't know caches intimately, but I don't see why that's algorithmically difficult. Just look at the Content-type of the response. Is it harder for caches to examine headers than content or URI? (That's an earnest, perhaps naïve, question.) See my earlier response. The problem is without a 'Vary' header or other cache-control headers, caches may assume that a URL is a fixed resource. If it were to assume that was static, then it wouldn't matter what was sent for the Accept, Accept-Encoding or Accept-Language ... and so the first request proxied gets cached, and then subsequent requests get the cached copy, even if that's not what the server would have sent. If we are talking about caching on the client here (not caching proxies), I would think in most cases requests are issued with the same Accept-* headers, so caching will work as expected anyway. I assume he's talking about caching proxies, where it's a real problem. [Robert] Link headers can be added with a simple apache configuration rule, and as they're static are easy to cache. So the server side is easy, and the client side is trivial. Hadn't heard of these. (They are on Wikipedia so they must be real.) What do they offer over HTML link elements populated from the Dublin Core Element Set? Wikipedia was the first place you looked? Not IETF or W3C? No wonder people say libraries are doomed, if even people who work in libraries go straight to Wikipedia. ... oh, and I should follow up to my posting from earlier tonight -- upon re-reading the HTTP/1.1 spec, it seems that there *is* a way to specify the authoritative URL returned without an HTTP round-trip, Content-Location : http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.14 Of course, it doesn't look like my web browser does anything with it: http://www.w3.org/Protocols/rfc2616/rfc2616 http://www.w3.org/Protocols/rfc2616/rfc2616.html http://www.w3.org/Protocols/rfc2616/rfc2616.txt ... so you'd still have to use Location: if you wanted it to show up to the general public. -Joe
Re: [CODE4LIB] The lie of the API
-Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joe Hourcle (They are on Wikipedia so they must be real.) Wikipedia was the first place you looked? Not IETF or W3C? No wonder people say libraries are doomed, if even people who work in libraries go straight to Wikipedia. It was a humorous aside, regrettably lacking a smiley. I think that comment would be better saved to pitch at folks who cite and link to w3schools as if authoritative. Some of them are even in libraries. Your other comments were informative, though. Thank you :) Cheers Hugh P Please consider the environment before you print this email. The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use, distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender by return e-mail or telephone and then delete this e-mail together with all attachments from your system.
Re: [CODE4LIB] The lie of the API
On Dec 1, 2013, at 9:36 PM, Barnes, Hugh wrote: -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Joe Hourcle (They are on Wikipedia so they must be real.) Wikipedia was the first place you looked? Not IETF or W3C? No wonder people say libraries are doomed, if even people who work in libraries go straight to Wikipedia. It was a humorous aside, regrettably lacking a smiley. Yes, a smiley would have helped. It also doesn't help that there used to be a website out there named 'ScoopThis'. They started as a wrestling parody site, but my favorite part was their advice column from 'Dusty the Fat, Bitter Cat'. I bring this up because their slogan was cuz if it’s on the net, it’s got to be true ... so I twitch a little whenever someone says something similar to that phrase. (unfortunately, the site's gone, and archive.org didn't cache them, so you can't see the photoshopped pictures of Dusty at Woodstock '99 or the Rock's cooking show. They started up a separate website for Dusty, but when they closed that one down, they put up a parody of a porn site, so you probably don't want to go looking for it) I think that comment would be better saved to pitch at folks who cite and link to w3schools as if authoritative. Some of them are even in libraries. Although I wish that w3schools would stop showing up so highly in searches for javascript methods css attributes, they did have a time when they were some of the best tutorials out there on web-related topics. I don't know if I can claim that to be true today, though. Your other comments were informative, though. Thank you :) I try ... especially when I'm procrastinating on doing posters that I need to have printed by Friday. (but if anyone has any complaints about data.gov or other federal data dissemination efforts, I'll be happy to work them in) -Joe
Re: [CODE4LIB] The lie of the API
On Dec 1, 2013 6:42 PM, Joe Hourcle onei...@grace.nascom.nasa.gov wrote: So that you don't screw up web proxies, you have to specify the 'Vary' header to tell which parameters you consider significant so that it knows what is or isn't cacheable. I believe that if a Vary isn't specified, and the content is not marked as non cachable, a cache must assume Vary:*, but I might be misremembering. (who has been managing web servers since HTTP/0.9, and gets annoyed when I have to explain to our security folks each year why I don't reject pre-HTTP/1.1 requests or follow the rest of the CIS benchmark recommendations that cause our web services to fail horribly) Old school represent (0.9 could out perform 1.0 if the request headers were more than 1 MTU or the first line was sent in a separate packet with nagle enabled). [Accept was a major cause of header bloat].
Re: [CODE4LIB] The lie of the API
On Dec 1, 2013, at 11:12 PM, Simon Spero wrote: On Dec 1, 2013 6:42 PM, Joe Hourcle onei...@grace.nascom.nasa.gov wrote: So that you don't screw up web proxies, you have to specify the 'Vary' header to tell which parameters you consider significant so that it knows what is or isn't cacheable. I believe that if a Vary isn't specified, and the content is not marked as non cachable, a cache must assume Vary:*, but I might be misremembering That would be horrible for caching proxies to assume that nothing's cacheable unless it said it was. (as typically only the really big websites or those that have seen some obvious problems bother with setting cache control headers.) I haven't done any exhaustive tests in many years, but I was noticing that proxies were starting to cache GET requests with query strings, which bothered me -- it used to be that anything that was an obvious CGI wasn't cached. (I guess that enough sites use it, it has to make the assumption that the sites aren't stateful, and that the parameters in the URL are enough information for hashing) (who has been managing web servers since HTTP/0.9, and gets annoyed when I have to explain to our security folks each year why I don't reject pre-HTTP/1.1 requests or follow the rest of the CIS benchmark recommendations that cause our web services to fail horribly) Old school represent (0.9 could out perform 1.0 if the request headers were more than 1 MTU or the first line was sent in a separate packet with nagle enabled). [Accept was a major cause of header bloat]. Don't even get me started on header bloat ... My main complaint about HTTP/1.1 is that it requires clients to support chunked encoding, and I've got to support a client that's got a buggy implementation. (and then my CGIs that serve 2GB tarballs start failing, and it's calling a program that's not smart enough to look for SIG_PIPE, so I end up with a dozen of 'em going all stupid and sucking down CPU on one of my servers) Most people don't have to support a community written HTTP client, though. (and the one alternative HTTP client in IDL doesn't let me interactive w/ the HTTP headers directly, so I can't put a wrapper around it to extract the tarball's filename from the Content-Disposition header) -Joe ps. yep, still having writer's block on posters.
Re: [CODE4LIB] The lie of the API
(posted in the comments on the blog and reposted here for further discussion, if interest) While I couldn't agree more with the post's starting point -- URIs identify (concepts) and use HTTP as your API -- I couldn't disagree more with the use content negotiation conclusion. I'm with Dan Cohen in his comment regarding using different URIs for different representations for several reasons below. It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules when you implement conneg. You can't get your own API wrong. I agree with Ruben that HTTP is better than rolling your own proprietary API, we disagree that conneg is the correct solution. The choice is between conneg or regular HTTP, not conneg or a proprietary API. Secondly, you need to look at the HTTP headers and parse quite a complex structure to determine what is being requested. You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing URI and content. For large scale systems like DPLA and Europeana, caching is essential for quality of service. How do you find our which formats are supported by conneg? By reading the documentation. Which could just say add .json on the end. The Vary header tells you that negotiation in the format dimension is possible, just not what to do to actually get anything back. There isn't a way to find this out from HTTP automatically,so now you need to read both the site's docs AND the HTTP docs. APIs can, on the other hand, do this. Consider OAI-PMH's ListMetadataFormats and SRU's Explain response. Instead you can have a separate URI for each representation and link them with Link headers, or just a simple rule like add '.json' on the end. No need for complicated content negotiation at all. Link headers can be added with a simple apache configuration rule, and as they're static are easy to cache. So the server side is easy, and the client side is trivial. Compared to being difficult at both ends with content negotiation. It can be useful to make statements about the different representations, and especially if you need to annotate the structure or content. Or share it -- you can't email someone a link that includes the right Accept headers to send -- as in the post, you need to send them a command line like curl with -H. An experiment for fans of content negotiation: Have both .json and 302 style conneg from your original URI to that .json file. Advertise both. See how many people do the conneg. If it's non-zero, I'll be extremely surprised. And a challenge: Even with libraries there's still complexity to figuring out how and what to serve. Find me sites that correctly implement * based fallbacks. Or even process q values. I'll bet I can find 10 that do content negotiation wrong, for every 1 that does it correctly. I'll start: dx.doi.org touts its content negotiation for metadata, yet doesn't implement q values or *s. You have to go to the documentation to figure out what Accept headers it will do string equality tests against. Rob On Fri, Nov 29, 2013 at 6:24 AM, Seth van Hooland svhoo...@ulb.ac.be wrote: Dear all, I guess some of you will be interested in the blogpost of my colleague and co-author Ruben regarding the misunderstandings on the use and abuse of APIs in a digital libraries context, including a description of both good and bad practices from Europeana, DPLA and the Cooper Hewitt museum: http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/ Kind regards, Seth van Hooland Président du Master en Sciences et Technologies de l'Information et de la Communication (MaSTIC) Université Libre de Bruxelles Av. F.D. Roosevelt, 50 CP 123 | 1050 Bruxelles http://homepages.ulb.ac.be/~svhoolan/ http://twitter.com/#!/sethvanhooland http://mastic.ulb.ac.be 0032 2 650 4765 Office: DC11.102
Re: [CODE4LIB] The lie of the API
Hi, I was happy to read this blog post, because it contain lots of very important statements, but as one of the developers of Europeana API I would like to mention some points. The idea of content negotiation is nice, but it also adds some additional burden for the API users. In some tools and programming languages it is easy to modify HTTP headers, in others it is not that trivial. For non tech people it is a burden. In an environment such as Europeana not only tech people would like to see and check the non HTML output, but it also has a meaning for metadata experts, marketing people, ingestion team members and so on. Europeana has a history, and even the API and the metadata model behind has its own history. When we released the new API which reflects the new metadata structure, it was evident, that we did not want to break existing client side applications. So we had to introduce versioning. With versioning we had the same choces as with content type: we can make it transparent in the URL or use hypermedia versioning via HTTP headers. This lead to the same problem as we had, so we choosed URL approach. Finally, when creating an API there are lots of different aspect we should consider. Beside technological, scientific or aesthetic aspects there are lots of other ones as well. We follow a way, which has good and bad points, but as I see the same is true for Ruben's suggestions. It is not true, that our way is driven by simle ignorance. We never claimed, that we created RESTFul and pedantic API. We did a practical one, and we keep improving it gradually, considering such a feedbacks as this post. Regards, Péter 2013/11/29 Robert Sanderson azarot...@gmail.com: (posted in the comments on the blog and reposted here for further discussion, if interest) While I couldn't agree more with the post's starting point -- URIs identify (concepts) and use HTTP as your API -- I couldn't disagree more with the use content negotiation conclusion. I'm with Dan Cohen in his comment regarding using different URIs for different representations for several reasons below. It's harder to implement Content Negotiation than your own API, because you get to define your own API whereas you have to follow someone else's rules when you implement conneg. You can't get your own API wrong. I agree with Ruben that HTTP is better than rolling your own proprietary API, we disagree that conneg is the correct solution. The choice is between conneg or regular HTTP, not conneg or a proprietary API. Secondly, you need to look at the HTTP headers and parse quite a complex structure to determine what is being requested. You can't just put a file in the file system, unlike with separate URIs for distinct representations where it just works, instead you need server side processing. This also makes it much harder to cache the responses, as the cache needs to determine whether or not the representation has changed -- the cache also needs to parse the headers rather than just comparing URI and content. For large scale systems like DPLA and Europeana, caching is essential for quality of service. How do you find our which formats are supported by conneg? By reading the documentation. Which could just say add .json on the end. The Vary header tells you that negotiation in the format dimension is possible, just not what to do to actually get anything back. There isn't a way to find this out from HTTP automatically,so now you need to read both the site's docs AND the HTTP docs. APIs can, on the other hand, do this. Consider OAI-PMH's ListMetadataFormats and SRU's Explain response. Instead you can have a separate URI for each representation and link them with Link headers, or just a simple rule like add '.json' on the end. No need for complicated content negotiation at all. Link headers can be added with a simple apache configuration rule, and as they're static are easy to cache. So the server side is easy, and the client side is trivial. Compared to being difficult at both ends with content negotiation. It can be useful to make statements about the different representations, and especially if you need to annotate the structure or content. Or share it -- you can't email someone a link that includes the right Accept headers to send -- as in the post, you need to send them a command line like curl with -H. An experiment for fans of content negotiation: Have both .json and 302 style conneg from your original URI to that .json file. Advertise both. See how many people do the conneg. If it's non-zero, I'll be extremely surprised. And a challenge: Even with libraries there's still complexity to figuring out how and what to serve. Find me sites that correctly implement * based fallbacks. Or even process q values. I'll bet I can find 10 that do content negotiation wrong, for every 1 that does it correctly. I'll start: dx.doi.org touts its content negotiation for metadata, yet doesn't
Re: [CODE4LIB] The lie of the API
Seth (and commenters) - The basic point is sound, but there are some important issues that are averted or are elided in the original article in order to make the underlying point more clearly. 1: It should be quite clear that there is no need to develop an API for the sole purpose of generating an alternate representation of a [document] in a form that is intended to be machine actionable as opposed to one that is intended to be rendered for human consumption, and the referent This is precisely what the content negotiation mechanism was designed for. 2: It is less clear, but still reasonable, to use content negotiation to treat content types for the same URI polysemously (having related,but slightly different senses). For example, the HTML rendering of a URI may carry slightly different propositional content than is carried in a set of RDF assertions*. 3: For stative actions not related to content, a formally defined API is required. 4: Since there is no intrinsic relationship between two objects with different URIs, breaking the connection for items which are identical** may require extra work to repair. 5: Cacheable content negotiation in HTTP has been around since the mid-late nineties. It's retro-chic. 6: API keys that protect information extractable from non-api protected sources were created to encourage people to learn how to implement screen-scrapers and finite state transducers. 7: The commenter who brought up the issue of the same URI denoting different FRBR entities must make a number of metaphysical commitments. Resulting models are FRBR-like, but are not pure FRBR. If the 1:1 principle were real, any of these approaches would present insuperable difficulties. Simon * Under a documentationalist interpretation, the propositional content must be different, so allowing at least some degree of polysemy is hard to avoid. ** absolute identity cannot apply, but most forms of relative identity have obvious interpretations. On Fri, Nov 29, 2013 at 8:24 AM, Seth van Hooland svhoo...@ulb.ac.bewrote: Dear all, I guess some of you will be interested in the blogpost of my colleague and co-author Ruben regarding the misunderstandings on the use and abuse of APIs in a digital libraries context, including a description of both good and bad practices from Europeana, DPLA and the Cooper Hewitt museum: http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/ Kind regards, Seth van Hooland Président du Master en Sciences et Technologies de l'Information et de la Communication (MaSTIC) Université Libre de Bruxelles Av. F.D. Roosevelt, 50 CP 123 | 1050 Bruxelles http://homepages.ulb.ac.be/~svhoolan/ http://twitter.com/#!/sethvanhooland http://mastic.ulb.ac.be 0032 2 650 4765 Office: DC11.102