Hi all,

So far we've been thinking of capabilities as equivalent to a set of
endpoints.

That's a rather technical definition. It also brings one important
limitation: one endpoint can only be "governed" by one capability.

Granted, most capabilities do require implementing specific endpoints. But
I wonder if, for the sake of being future-proof, we shouldn't broaden the
meaning of that term to embrace *logical* or *behavioral* concepts as well.

One example that comes to mind: a REST catalog implementor may choose to
implement the transactions-commit endpoint to fully comply with the
"tables" capability; but for performance reasons, or simply because it's
too complex, they could opt for rejecting multi-table commits (iow, if a
CommitTransactionRequest contains one single CommitTableRequest, that's
fine, otherwise, the endpoint would return an error). It would be nice to
express that as a capability: this way the client knows that it is safe to
call the transactions-commit endpoint, but with one CommitTableRequest at a
time.

Such a capability would not be defined by a specific endpoint, but rather,
would influence the behavior exhibited by certain endpoints.

Thanks,

Alex

On Thu, Jun 27, 2024 at 11:34 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Jack
>
> I like Robert's proposal. Back to the topics, I think grouping with
> tags is more "flexible" (it was what we included in the REST spec
> proposal as well).
>
> Regards
> JB
>
> On Wed, Jun 26, 2024 at 6:26 PM Jack Ye <yezhao...@gmail.com> wrote:
> >
> > It seems like there are 2 sub-topics here:
> > 1. should we group operations with tags, or should we do this
> per-operation/endpoint?
> > 2. how should we do the capability/versioning for each unit (either per
> tag or per operation)
> >
> > Shall we first conclude on 1?
> >
> > For 1, my take is that we will need to do it per operation, for 2
> reasons:
> >
> > (1) There are many REST services that would only implement a very small
> set of APIs, such as just loadTable and loadView. Some will choose to not
> implement very specific endpoints, such as renameTable. Tags seems
> convenient but it is mandating people to implement a specific group of APIs
> together, which is a lot of burdens for especially small organizations, if
> they just want to support very specific goals like reading through IRC.
> >
> > (2) Suppose a new tag is added in the future, the server returns that
> tag, but an older client does not understand it, it might cause mistakes in
> the client's understanding of what is supported and what is not, when a tag
> contains both features in existing APIs and also new APIs. If we define
> that tags do not overlap with each other, this is probably not a concern.
> However, (1) still is a problem from a usability perspective.
> >
> > Best,
> > Jack Ye
> >
> >
> >
> >
> > On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks <dwe...@apache.org> wrote:
> >>
> >> I think Robert's approach is a reasonable compromise here.
> >>
> >> If we wanted a "per operation/endpoint" versioning, I think I'd prefer
> Micah's OpenAPI spec based approach because it's more standardized, but I
> feel adds a lot of client complexity.
> >>
> >> -Dan
> >>
> >>
> >>
> >> On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp <sn...@snazy.de> wrote:
> >>>
> >>> (I think, compatibility deserves a separate thread - it's a "huge"
> topic)
> >>>
> >>> Based on experience, we decided on the following with Nessie:
> >>>
> >>> Unknown fields/attributes in a structure _DO_ cause (de)serialization
> failures.
> >>> "Stable API versions" - endpoint additions and/or added query
> parameters and/or enhanced structures do _NOT_ require a new API version
> (as in the endpoint's route/path).
> >>> "Flexible spec versions" - new and updated "capabilities" however
> might cause a bump in the "spec version" that the server announces in its
> `getConfig` result.
> >>>
> >>> Adding new routes/paths may require new endpoint implementations on
> the server side, which can easily lead to a lot of (unnecessarily
> boilerplate) code. Using different routes/paths is justified if the API is
> changed "fundamentally". We call the "path component" (api/v1/...,
> api/v2/...) API version - the server indicates the minimum and maximum
> supported API version, in case a client wants to "upgrade". I recommend to
> _not_ bump the API version in the route/path if it's not really necessary.
> >>>
> >>> Regarding the requirement to fail on unknown attributes: Unknown
> attributes may contain important information. A client may send a newer
> version of a request object with an important new field, but the (older)
> server discards the new attribute. Think of an attribute that for example
> defines a "commit condition" that the client expects to be respected. "New"
> attributes must be omittable (e.g. don't serialize if null/default) -
> clients indicate the "usage" of an added attribute using some request
> attribute (for example: "boolean returnExtendedInformation").
> >>>
> >>> The list of capabilities can be indicated with included "spec
> versions", to tell clients which features/functionalities a server
> supports."Production" spec versions could start with 1, and "reserve" 0 for
> experimental/unsupported/poc kind of implementation. It could look like
> this:
> >>>   capabilities: [
> >>>     "table-spec/2,3",   // but not table-spec v1 here
> >>>     "view-spec/1",
> >>>     "table-api/1",
> >>>     "view-api/1",
> >>>     "udf-api/1",
> >>>     "super-feature/2,4,6",   // but not spec versions 0,1,3,5,7+
> >>>     ...
> >>>   ]
> >>> Incrementing a spec version in the list of capabilities doesn't break
> any client. We could also define a structure to describe each capability:
> >>>   components:
> >>>     schemas:
> >>>       Capability:
> >>>         name:
> >>>           type: string
> >>>           description: Name of the capability
> >>>         versions:
> >>>           type: array:
> >>>           description: List of supported spec versions of this
> capability. 0 means experimental (non-production) without any guarantees
> about the stability of schema for request and response parameters.
> >>>           items:
> >>>             type: integer
> >>>             format: int32
> >>>
> >>> In Nessie, we ensure backwards and forwards compatibility using a
> specialized test suite that runs the "in tree" client against older server
> versions and older client versions against the "in tree" server version. It
> works fine for us for a few years now - and it did help preventing
> compatibility issues.
> >>>
> >>>
> >>> On 26.06.24 07:44, Péter Váry wrote:
> >>>
> >>> Hi everyone,
> >>>
> >>> A few considerations:
> >>> - I think we should explicitly state which client/service
> interoperability we are aiming for. I expect that we want to support both
> old client -> new server, and new client -> old server communications.
> >>> - I agree with Jack, that we should think about versions in advance -
> HMS tried to be backwards compatible for everything, and that made it hard
> to move forward / deprecate things.
> >>> - Still we should try to keep the backwards incompatible changes
> minimal. (All clients should be able to ignore unknown incoming fields /
> New optional input parameter should drive new features / Try to avoid enums
> in responses where we expect changes (?))
> >>> - OTOH, it could be important for clients to know which of the
> backwards compatible changes are implemented for the given server - so I
> would decouple the URI from the versioning. Maybe major version change
> should (could) change the URI, but backwards compatible changes should be
> served on the same URI, but could be identified by different minor versions.
> >>>
> >>> This is exciting stuff!
> >>> Thanks for pushing this forward!
> >>>
> >>> Peter
> >>>
> >>>
> >>> On Wed, Jun 26, 2024, 00:15 Jack Ye <yezhao...@gmail.com> wrote:
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> I feel I do not see a good answer to why not just simply version each
> API? When using tag, it means I have to offer capabilities per-tagged
> group. However, I could for example just offer loadTable and nothing else
> in a catalog, and that should still be Iceberg REST compliant. And I think
> we need a versioning story anyway, there is no way around it.
> >>>>
> >>>> Here is the workflow in my mind with versioning:
> >>>>
> >>>> 1. Going forward, every time the REST catalog spec introduces any new
> API endpoints or backwards incompatible changes to the existing APIs, the
> version of the specific API is incremented. So suppose the PlanTable API is
> added, this API will be at version v1. Suppose UpdateTable is updated with
> a new update type, that API will be at version v2, but PlanTable will
> remain at v1.
> >>>>
> >>>> 2. a catalog must implement getConfig. This API is the only one that
> is required.
> >>>>
> >>>> 3. in getConfig, in the defaults map (it could be in some new
> metadata structure, but since we want strong backwards compatibility
> guarantee, reusing string maps seems to be the best way), server returns
> key-value pairs of:
> >>>> - key: operation:<operationName>
> >>>> - value: version number
> >>>>
> >>>> 4. the client assumes that the map is ordered, and resolves API
> versions sequentially. For example, suppose I have the following map:
> >>>>
> >>>> { "operation:planTable": "1", "operation:loadTable": "2" }
> >>>>
> >>>> Note that by "supporting", it means to return a response in a
> predictable way that is compliant with the spec. It can also return 406
> UnsupportedOperation as a way to support it.
> >>>>
> >>>> There is also a special version *, that means any version can work.
> >>>>
> >>>> 5. Backwards compatibility: suppose the client is at a higher version
> than the server, then the client should always be able to understand the
> server's full list of capabilities.
> >>>>
> >>>> 6. Forward compatibility: suppose the client is at a lower version
> than the server, then the client should parse whatever operation it
> understands, and use the highest version it could support to execute the
> operation. Suppose the client only supports loadTable v1, then it will
> continue to hit the GET v1/namespaces/{ns}/tables/{table} route, instead of
> GET v2/namespaces/{ns}/tables/{table}. The v1 route could continue to
> support the client, or it could throw 406 to indicate that this route is
> deprecated and the client needs to upgrade.
> >>>>
> >>>> For initial backwards compatibility, I think not returning anything
> should mean that all API that the client understands are having version *.
> >>>>
> >>>> What do people think of it, compared to the tag approach?
> >>>>
> >>>> Best,
> >>>> Jack Ye
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Jun 24, 2024 at 1:42 PM Micah Kornfield <
> emkornfi...@gmail.com> wrote:
> >>>>>
> >>>>> I don't have strong opinions either way here, just thought it was
> worth raising some concerns over possible evolution here.  Some responses
> inline, but if capabilities seem to meet the requirement at hand, then it
> does potentially seem the simplest mechanism.
> >>>>>
> >>>>>
> >>>>>> I think we also want to avoid relyance on server specific published
> OpenAPI as they may leak other options/parameters/etc.  This may lead to
> confusion around what the canonical spec is and make clients incompatible
> if they're generated off of a non-standard spec document.
> >>>>>
> >>>>>
> >>>>> Yeah, I wasn't proposing necessarily using built in functionality
> but a pre-scrubbed document.  Since there is no reference service
> implementation for REST it seems like each implementor would need to
> describe the best way of scrubbing there description.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> @Micah this sounds to me as if the client would then have to parse
> a bunch of endpoints to figure out whether it's safe to e.g. call loading a
> view or dropping a table on the given REST server. Rather than having a
> dedicated endpoint we're just using the /config endpoint to provide
> information about what a server supports.
> >>>>>
> >>>>>
> >>>>> I was not suggesting multiple endpoints here, simply different
> contents  for /config I agree in the short term this does add complexity on
> the clients. But given that the canonical REST API clients are being
> developed into the standard library, I'm not sure how much toil this would
> cause in general. This also does not necessarily need to called up-front
> but could be called to verify existence vs a permission issue after an
> error was received.
> >>>>>
> >>>>> What round-trips did you have in mind here?
> >>>>>
> >>>>>
> >>>>>> All good points though, but I'm not aware of a standard way to
> handle this.
> >>>>>
> >>>>>
> >>>>> IIUC, this sounds like a standard service description problem to me,
> the solution with capabilities appears to be one level abstraction on top
> of this.  Service discovery seems like it has been reimplemented a few
> different times depending on the technology [1][2][3]
> >>>>>
> >>>>>
> >>>>>> I think versioning adds another level of complexity, but might be
> necessary since I expect these will evolve to some extent and may even
> require hitting versioned urls.
> >>>>>
> >>>>>
> >>>>> If there is no concrete proposal on versioning, I agree it probably
> pays to side step this.  The endpoint transitioning from list of strings to
> list of objects, would be an obvious sign to clients that they are out of
> date.  I think serving a service description(s), despite its complexity, is
> likely the most principled way of versioning items appropriately, but this
> definitely requires more in depth thought/design.
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Micah
> >>>>>
> >>>>> [1] https://en.wikipedia.org/wiki/Web_Services_Description_Language
> >>>>> [2]
> https://en.wikipedia.org/wiki/Web_Application_Description_Language
> >>>>> [3] https://developers.google.com/discovery/v1/reference/apis
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Jun 24, 2024 at 12:42 PM Daniel Weeks <dwe...@apache.org>
> wrote:
> >>>>>>
> >>>>>> Hey Micah,
> >>>>>>
> >>>>>> I think what we're trying to achieve is strike a balance between
> client complexity and ability to support multiple server-side
> capabilities.  One challenge we've run into is if a client performs an
> operation (e.g. listViews), but receives a 403 code, it's not clear whether
> the client doesn't have access or the server doesn't support an endpoint
> but isn't sending a 404 for security reasons.  This is a simple way for the
> client to understand what it should expect from the server.
> >>>>>>
> >>>>>> >  Another option would be just list all endpoints . . . and let
> clients take appropriate actions
> >>>>>> > This could be done by vending the OpenAPI spec the server
> supports at its own endpoint. I think this avoids the future problem of
> having to classify new endpoints into a specific capability.
> >>>>>>
> >>>>>> You're right that this would be the most complete way to handle
> this, but it's really complicated and may require additional "handshake"
> calls even for small interactions with the catalog service.  I think this
> puts a lot of onus on the client, when what we're describing is a set of
> endpoints that correspond to a capability.
> >>>>>>
> >>>>>> I think we also want to avoid relyance on server specific published
> OpenAPI as they may leak other options/parameters/etc.  This may lead to
> confusion around what the canonical spec is and make clients incompatible
> if they're generated off of a non-standard spec document.
> >>>>>>
> >>>>>> All good points though, but I'm not aware of a standard way to
> handle this.
> >>>>>>
> >>>>>> I think versioning adds another level of complexity, but might be
> necessary since I expect these will evolve to some extent and may even
> require hitting versioned urls.
> >>>>>>
> >>>>>> -Dan
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Jun 24, 2024 at 12:03 AM Eduard Tudenhöfner <
> etudenhoef...@apache.org> wrote:
> >>>>>>>
> >>>>>>> We had a separate discussion with Dan on the oauth2 flag last week
> and came to the same conclusion that removing the oauth2 capability is
> probably the best for now.
> >>>>>>> This is mainly because we can't really act on the oauth2
> capability right now, because the /tokens endpoint is called before we hit
> the /config endpoint.
> >>>>>>>
> >>>>>>> > Another option would be just list all endpoints (and maybe even
> further which operations are supported) the server actually supports and
> let clients take appropriate actions (i.e. grouping could happen on the
> client side).  This could be done by vending the OpenAPI spec the server
> supports at its own endpoint. I think this avoids the future problem of
> having to classify new endpoints into a specific capability.
> >>>>>>>
> >>>>>>> @Micah this sounds to me as if the client would then have to parse
> a bunch of endpoints to figure out whether it's safe to e.g. call loading a
> view or dropping a table on the given REST server. Rather than having a
> dedicated endpoint we're just using the /config endpoint to provide
> information about what a server supports.
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Eduard
> >>>>>>>
> >>>>>>> On Fri, Jun 21, 2024 at 8:27 PM Ryan Blue
> <b...@databricks.com.invalid> wrote:
> >>>>>>>>
> >>>>>>>> Let's remove the oauth2 tag for now until we figure out how to
> move forward there. That makes sense to me.
> >>>>>>>>
> >>>>>>>> On Fri, Jun 21, 2024 at 9:30 AM Dmitri Bourlatchkov
> <dmitri.bourlatch...@dremio.com.invalid> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Eduard,
> >>>>>>>>>
> >>>>>>>>> The capabilities PR looks good to me overall. I have a concern
> with the "oauth2" tag name though.
> >>>>>>>>>
> >>>>>>>>> I also commented [1] in GH but the comment appears to be closed
> by default :)
> >>>>>>>>>
> >>>>>>>>> I believe the term "oauth2" is confusing in this context with
> respect to RFC 6749 [2] as discussed in depth on another thread [3]
> >>>>>>>>>
> >>>>>>>>> The functionality behind the /tokens endpoint is quite specific
> to the Iceberg REST spec and as the other discussion highlights, there are
> concerns with respect to OAuth2 interoperability with other OAuth2 servers.
> >>>>>>>>>
> >>>>>>>>> What do you think about using a different tag name for it, for
> example "local-tokens" or "auth-tokens"?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Dmitri.
> >>>>>>>>>
> >>>>>>>>> [1]
> https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934
> >>>>>>>>> [2] https://www.rfc-editor.org/rfc/rfc6749
> >>>>>>>>> [3]
> https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50
> >>>>>>>>>
> >>>>>>>>> On Thu, Jun 20, 2024 at 7:28 AM Eduard Tudenhoefner <
> etudenhoef...@apache.org> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hey everyone,
> >>>>>>>>>>
> >>>>>>>>>> I'd like to bring up the discussion around describing REST
> server capabilities via the /config endpoint.
> >>>>>>>>>> There is PR #9940 that describes the OpenAPI spec changes.
> >>>>>>>>>>
> >>>>>>>>>> Mainly we'd like to have a capabilities field in the
> ConfigResponse that allows servers to indicate to clients which
> capabilities are being supported.
> >>>>>>>>>>
> >>>>>>>>>> So far we have the following capabilities:
> >>>>>>>>>>
> >>>>>>>>>> tables
> >>>>>>>>>> views
> >>>>>>>>>> remote-signing
> >>>>>>>>>> vended-credentials
> >>>>>>>>>> multi-table-commit
> >>>>>>>>>> register-table
> >>>>>>>>>> table-metrics
> >>>>>>>>>> oauth2
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> The general idea behind a capability is that if e.g. a server
> supports views, then that server must implement all endpoints grouped under
> that capability.
> >>>>>>>>>> It's worth noting that the /config endpoint is currently being
> implicit (meaning that every REST server would have to implement it).
> >>>>>>>>>>
> >>>>>>>>>> One discussion point that came up during review is how we want
> to handle capabilities and backwards compatibility and what the default
> capability would be, since older servers don't know anything about
> capabilities (in such a case we could assume that the default capabilities
> would be oauth2 / tables).
> >>>>>>>>>>
> >>>>>>>>>> Are there any other capabilities that we'd like to include in
> the list?
> >>>>>>>>>>
> >>>>>>>>>> Eduard
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Ryan Blue
> >>>>>>>> Databricks
> >>>
> >>> --
> >>> Robert Stupp
> >>> @snazy
>

Reply via email to