Hi all, So far we've been thinking of capabilities as equivalent to a set of endpoints.
That's a rather technical definition. It also brings one important limitation: one endpoint can only be "governed" by one capability. Granted, most capabilities do require implementing specific endpoints. But I wonder if, for the sake of being future-proof, we shouldn't broaden the meaning of that term to embrace *logical* or *behavioral* concepts as well. One example that comes to mind: a REST catalog implementor may choose to implement the transactions-commit endpoint to fully comply with the "tables" capability; but for performance reasons, or simply because it's too complex, they could opt for rejecting multi-table commits (iow, if a CommitTransactionRequest contains one single CommitTableRequest, that's fine, otherwise, the endpoint would return an error). It would be nice to express that as a capability: this way the client knows that it is safe to call the transactions-commit endpoint, but with one CommitTableRequest at a time. Such a capability would not be defined by a specific endpoint, but rather, would influence the behavior exhibited by certain endpoints. Thanks, Alex On Thu, Jun 27, 2024 at 11:34 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Jack > > I like Robert's proposal. Back to the topics, I think grouping with > tags is more "flexible" (it was what we included in the REST spec > proposal as well). > > Regards > JB > > On Wed, Jun 26, 2024 at 6:26 PM Jack Ye <yezhao...@gmail.com> wrote: > > > > It seems like there are 2 sub-topics here: > > 1. should we group operations with tags, or should we do this > per-operation/endpoint? > > 2. how should we do the capability/versioning for each unit (either per > tag or per operation) > > > > Shall we first conclude on 1? > > > > For 1, my take is that we will need to do it per operation, for 2 > reasons: > > > > (1) There are many REST services that would only implement a very small > set of APIs, such as just loadTable and loadView. Some will choose to not > implement very specific endpoints, such as renameTable. Tags seems > convenient but it is mandating people to implement a specific group of APIs > together, which is a lot of burdens for especially small organizations, if > they just want to support very specific goals like reading through IRC. > > > > (2) Suppose a new tag is added in the future, the server returns that > tag, but an older client does not understand it, it might cause mistakes in > the client's understanding of what is supported and what is not, when a tag > contains both features in existing APIs and also new APIs. If we define > that tags do not overlap with each other, this is probably not a concern. > However, (1) still is a problem from a usability perspective. > > > > Best, > > Jack Ye > > > > > > > > > > On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks <dwe...@apache.org> wrote: > >> > >> I think Robert's approach is a reasonable compromise here. > >> > >> If we wanted a "per operation/endpoint" versioning, I think I'd prefer > Micah's OpenAPI spec based approach because it's more standardized, but I > feel adds a lot of client complexity. > >> > >> -Dan > >> > >> > >> > >> On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp <sn...@snazy.de> wrote: > >>> > >>> (I think, compatibility deserves a separate thread - it's a "huge" > topic) > >>> > >>> Based on experience, we decided on the following with Nessie: > >>> > >>> Unknown fields/attributes in a structure _DO_ cause (de)serialization > failures. > >>> "Stable API versions" - endpoint additions and/or added query > parameters and/or enhanced structures do _NOT_ require a new API version > (as in the endpoint's route/path). > >>> "Flexible spec versions" - new and updated "capabilities" however > might cause a bump in the "spec version" that the server announces in its > `getConfig` result. > >>> > >>> Adding new routes/paths may require new endpoint implementations on > the server side, which can easily lead to a lot of (unnecessarily > boilerplate) code. Using different routes/paths is justified if the API is > changed "fundamentally". We call the "path component" (api/v1/..., > api/v2/...) API version - the server indicates the minimum and maximum > supported API version, in case a client wants to "upgrade". I recommend to > _not_ bump the API version in the route/path if it's not really necessary. > >>> > >>> Regarding the requirement to fail on unknown attributes: Unknown > attributes may contain important information. A client may send a newer > version of a request object with an important new field, but the (older) > server discards the new attribute. Think of an attribute that for example > defines a "commit condition" that the client expects to be respected. "New" > attributes must be omittable (e.g. don't serialize if null/default) - > clients indicate the "usage" of an added attribute using some request > attribute (for example: "boolean returnExtendedInformation"). > >>> > >>> The list of capabilities can be indicated with included "spec > versions", to tell clients which features/functionalities a server > supports."Production" spec versions could start with 1, and "reserve" 0 for > experimental/unsupported/poc kind of implementation. It could look like > this: > >>> capabilities: [ > >>> "table-spec/2,3", // but not table-spec v1 here > >>> "view-spec/1", > >>> "table-api/1", > >>> "view-api/1", > >>> "udf-api/1", > >>> "super-feature/2,4,6", // but not spec versions 0,1,3,5,7+ > >>> ... > >>> ] > >>> Incrementing a spec version in the list of capabilities doesn't break > any client. We could also define a structure to describe each capability: > >>> components: > >>> schemas: > >>> Capability: > >>> name: > >>> type: string > >>> description: Name of the capability > >>> versions: > >>> type: array: > >>> description: List of supported spec versions of this > capability. 0 means experimental (non-production) without any guarantees > about the stability of schema for request and response parameters. > >>> items: > >>> type: integer > >>> format: int32 > >>> > >>> In Nessie, we ensure backwards and forwards compatibility using a > specialized test suite that runs the "in tree" client against older server > versions and older client versions against the "in tree" server version. It > works fine for us for a few years now - and it did help preventing > compatibility issues. > >>> > >>> > >>> On 26.06.24 07:44, Péter Váry wrote: > >>> > >>> Hi everyone, > >>> > >>> A few considerations: > >>> - I think we should explicitly state which client/service > interoperability we are aiming for. I expect that we want to support both > old client -> new server, and new client -> old server communications. > >>> - I agree with Jack, that we should think about versions in advance - > HMS tried to be backwards compatible for everything, and that made it hard > to move forward / deprecate things. > >>> - Still we should try to keep the backwards incompatible changes > minimal. (All clients should be able to ignore unknown incoming fields / > New optional input parameter should drive new features / Try to avoid enums > in responses where we expect changes (?)) > >>> - OTOH, it could be important for clients to know which of the > backwards compatible changes are implemented for the given server - so I > would decouple the URI from the versioning. Maybe major version change > should (could) change the URI, but backwards compatible changes should be > served on the same URI, but could be identified by different minor versions. > >>> > >>> This is exciting stuff! > >>> Thanks for pushing this forward! > >>> > >>> Peter > >>> > >>> > >>> On Wed, Jun 26, 2024, 00:15 Jack Ye <yezhao...@gmail.com> wrote: > >>>> > >>>> Hi everyone, > >>>> > >>>> I feel I do not see a good answer to why not just simply version each > API? When using tag, it means I have to offer capabilities per-tagged > group. However, I could for example just offer loadTable and nothing else > in a catalog, and that should still be Iceberg REST compliant. And I think > we need a versioning story anyway, there is no way around it. > >>>> > >>>> Here is the workflow in my mind with versioning: > >>>> > >>>> 1. Going forward, every time the REST catalog spec introduces any new > API endpoints or backwards incompatible changes to the existing APIs, the > version of the specific API is incremented. So suppose the PlanTable API is > added, this API will be at version v1. Suppose UpdateTable is updated with > a new update type, that API will be at version v2, but PlanTable will > remain at v1. > >>>> > >>>> 2. a catalog must implement getConfig. This API is the only one that > is required. > >>>> > >>>> 3. in getConfig, in the defaults map (it could be in some new > metadata structure, but since we want strong backwards compatibility > guarantee, reusing string maps seems to be the best way), server returns > key-value pairs of: > >>>> - key: operation:<operationName> > >>>> - value: version number > >>>> > >>>> 4. the client assumes that the map is ordered, and resolves API > versions sequentially. For example, suppose I have the following map: > >>>> > >>>> { "operation:planTable": "1", "operation:loadTable": "2" } > >>>> > >>>> Note that by "supporting", it means to return a response in a > predictable way that is compliant with the spec. It can also return 406 > UnsupportedOperation as a way to support it. > >>>> > >>>> There is also a special version *, that means any version can work. > >>>> > >>>> 5. Backwards compatibility: suppose the client is at a higher version > than the server, then the client should always be able to understand the > server's full list of capabilities. > >>>> > >>>> 6. Forward compatibility: suppose the client is at a lower version > than the server, then the client should parse whatever operation it > understands, and use the highest version it could support to execute the > operation. Suppose the client only supports loadTable v1, then it will > continue to hit the GET v1/namespaces/{ns}/tables/{table} route, instead of > GET v2/namespaces/{ns}/tables/{table}. The v1 route could continue to > support the client, or it could throw 406 to indicate that this route is > deprecated and the client needs to upgrade. > >>>> > >>>> For initial backwards compatibility, I think not returning anything > should mean that all API that the client understands are having version *. > >>>> > >>>> What do people think of it, compared to the tag approach? > >>>> > >>>> Best, > >>>> Jack Ye > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Mon, Jun 24, 2024 at 1:42 PM Micah Kornfield < > emkornfi...@gmail.com> wrote: > >>>>> > >>>>> I don't have strong opinions either way here, just thought it was > worth raising some concerns over possible evolution here. Some responses > inline, but if capabilities seem to meet the requirement at hand, then it > does potentially seem the simplest mechanism. > >>>>> > >>>>> > >>>>>> I think we also want to avoid relyance on server specific published > OpenAPI as they may leak other options/parameters/etc. This may lead to > confusion around what the canonical spec is and make clients incompatible > if they're generated off of a non-standard spec document. > >>>>> > >>>>> > >>>>> Yeah, I wasn't proposing necessarily using built in functionality > but a pre-scrubbed document. Since there is no reference service > implementation for REST it seems like each implementor would need to > describe the best way of scrubbing there description. > >>>>> > >>>>> > >>>>>> > >>>>>> @Micah this sounds to me as if the client would then have to parse > a bunch of endpoints to figure out whether it's safe to e.g. call loading a > view or dropping a table on the given REST server. Rather than having a > dedicated endpoint we're just using the /config endpoint to provide > information about what a server supports. > >>>>> > >>>>> > >>>>> I was not suggesting multiple endpoints here, simply different > contents for /config I agree in the short term this does add complexity on > the clients. But given that the canonical REST API clients are being > developed into the standard library, I'm not sure how much toil this would > cause in general. This also does not necessarily need to called up-front > but could be called to verify existence vs a permission issue after an > error was received. > >>>>> > >>>>> What round-trips did you have in mind here? > >>>>> > >>>>> > >>>>>> All good points though, but I'm not aware of a standard way to > handle this. > >>>>> > >>>>> > >>>>> IIUC, this sounds like a standard service description problem to me, > the solution with capabilities appears to be one level abstraction on top > of this. Service discovery seems like it has been reimplemented a few > different times depending on the technology [1][2][3] > >>>>> > >>>>> > >>>>>> I think versioning adds another level of complexity, but might be > necessary since I expect these will evolve to some extent and may even > require hitting versioned urls. > >>>>> > >>>>> > >>>>> If there is no concrete proposal on versioning, I agree it probably > pays to side step this. The endpoint transitioning from list of strings to > list of objects, would be an obvious sign to clients that they are out of > date. I think serving a service description(s), despite its complexity, is > likely the most principled way of versioning items appropriately, but this > definitely requires more in depth thought/design. > >>>>> > >>>>> > >>>>> Thanks, > >>>>> Micah > >>>>> > >>>>> [1] https://en.wikipedia.org/wiki/Web_Services_Description_Language > >>>>> [2] > https://en.wikipedia.org/wiki/Web_Application_Description_Language > >>>>> [3] https://developers.google.com/discovery/v1/reference/apis > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Mon, Jun 24, 2024 at 12:42 PM Daniel Weeks <dwe...@apache.org> > wrote: > >>>>>> > >>>>>> Hey Micah, > >>>>>> > >>>>>> I think what we're trying to achieve is strike a balance between > client complexity and ability to support multiple server-side > capabilities. One challenge we've run into is if a client performs an > operation (e.g. listViews), but receives a 403 code, it's not clear whether > the client doesn't have access or the server doesn't support an endpoint > but isn't sending a 404 for security reasons. This is a simple way for the > client to understand what it should expect from the server. > >>>>>> > >>>>>> > Another option would be just list all endpoints . . . and let > clients take appropriate actions > >>>>>> > This could be done by vending the OpenAPI spec the server > supports at its own endpoint. I think this avoids the future problem of > having to classify new endpoints into a specific capability. > >>>>>> > >>>>>> You're right that this would be the most complete way to handle > this, but it's really complicated and may require additional "handshake" > calls even for small interactions with the catalog service. I think this > puts a lot of onus on the client, when what we're describing is a set of > endpoints that correspond to a capability. > >>>>>> > >>>>>> I think we also want to avoid relyance on server specific published > OpenAPI as they may leak other options/parameters/etc. This may lead to > confusion around what the canonical spec is and make clients incompatible > if they're generated off of a non-standard spec document. > >>>>>> > >>>>>> All good points though, but I'm not aware of a standard way to > handle this. > >>>>>> > >>>>>> I think versioning adds another level of complexity, but might be > necessary since I expect these will evolve to some extent and may even > require hitting versioned urls. > >>>>>> > >>>>>> -Dan > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Mon, Jun 24, 2024 at 12:03 AM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >>>>>>> > >>>>>>> We had a separate discussion with Dan on the oauth2 flag last week > and came to the same conclusion that removing the oauth2 capability is > probably the best for now. > >>>>>>> This is mainly because we can't really act on the oauth2 > capability right now, because the /tokens endpoint is called before we hit > the /config endpoint. > >>>>>>> > >>>>>>> > Another option would be just list all endpoints (and maybe even > further which operations are supported) the server actually supports and > let clients take appropriate actions (i.e. grouping could happen on the > client side). This could be done by vending the OpenAPI spec the server > supports at its own endpoint. I think this avoids the future problem of > having to classify new endpoints into a specific capability. > >>>>>>> > >>>>>>> @Micah this sounds to me as if the client would then have to parse > a bunch of endpoints to figure out whether it's safe to e.g. call loading a > view or dropping a table on the given REST server. Rather than having a > dedicated endpoint we're just using the /config endpoint to provide > information about what a server supports. > >>>>>>> > >>>>>>> Thanks > >>>>>>> Eduard > >>>>>>> > >>>>>>> On Fri, Jun 21, 2024 at 8:27 PM Ryan Blue > <b...@databricks.com.invalid> wrote: > >>>>>>>> > >>>>>>>> Let's remove the oauth2 tag for now until we figure out how to > move forward there. That makes sense to me. > >>>>>>>> > >>>>>>>> On Fri, Jun 21, 2024 at 9:30 AM Dmitri Bourlatchkov > <dmitri.bourlatch...@dremio.com.invalid> wrote: > >>>>>>>>> > >>>>>>>>> Hi Eduard, > >>>>>>>>> > >>>>>>>>> The capabilities PR looks good to me overall. I have a concern > with the "oauth2" tag name though. > >>>>>>>>> > >>>>>>>>> I also commented [1] in GH but the comment appears to be closed > by default :) > >>>>>>>>> > >>>>>>>>> I believe the term "oauth2" is confusing in this context with > respect to RFC 6749 [2] as discussed in depth on another thread [3] > >>>>>>>>> > >>>>>>>>> The functionality behind the /tokens endpoint is quite specific > to the Iceberg REST spec and as the other discussion highlights, there are > concerns with respect to OAuth2 interoperability with other OAuth2 servers. > >>>>>>>>> > >>>>>>>>> What do you think about using a different tag name for it, for > example "local-tokens" or "auth-tokens"? > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Dmitri. > >>>>>>>>> > >>>>>>>>> [1] > https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934 > >>>>>>>>> [2] https://www.rfc-editor.org/rfc/rfc6749 > >>>>>>>>> [3] > https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50 > >>>>>>>>> > >>>>>>>>> On Thu, Jun 20, 2024 at 7:28 AM Eduard Tudenhoefner < > etudenhoef...@apache.org> wrote: > >>>>>>>>>> > >>>>>>>>>> Hey everyone, > >>>>>>>>>> > >>>>>>>>>> I'd like to bring up the discussion around describing REST > server capabilities via the /config endpoint. > >>>>>>>>>> There is PR #9940 that describes the OpenAPI spec changes. > >>>>>>>>>> > >>>>>>>>>> Mainly we'd like to have a capabilities field in the > ConfigResponse that allows servers to indicate to clients which > capabilities are being supported. > >>>>>>>>>> > >>>>>>>>>> So far we have the following capabilities: > >>>>>>>>>> > >>>>>>>>>> tables > >>>>>>>>>> views > >>>>>>>>>> remote-signing > >>>>>>>>>> vended-credentials > >>>>>>>>>> multi-table-commit > >>>>>>>>>> register-table > >>>>>>>>>> table-metrics > >>>>>>>>>> oauth2 > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> The general idea behind a capability is that if e.g. a server > supports views, then that server must implement all endpoints grouped under > that capability. > >>>>>>>>>> It's worth noting that the /config endpoint is currently being > implicit (meaning that every REST server would have to implement it). > >>>>>>>>>> > >>>>>>>>>> One discussion point that came up during review is how we want > to handle capabilities and backwards compatibility and what the default > capability would be, since older servers don't know anything about > capabilities (in such a case we could assume that the default capabilities > would be oauth2 / tables). > >>>>>>>>>> > >>>>>>>>>> Are there any other capabilities that we'd like to include in > the list? > >>>>>>>>>> > >>>>>>>>>> Eduard > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Ryan Blue > >>>>>>>> Databricks > >>> > >>> -- > >>> Robert Stupp > >>> @snazy >