Re: [DISCUSS] Describing REST Server capabilities

Daniel Weeks Wed, 26 Jun 2024 09:02:49 -0700

I think Robert's approach is a reasonable compromise here.

If we wanted a "per operation/endpoint" versioning, I think I'd prefer
Micah's OpenAPI spec based approach because it's more standardized, but I
feel adds a lot of client complexity.


-Dan



On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp <sn...@snazy.de> wrote:

> (I think, compatibility deserves a separate thread - it's a "huge" topic)
>
> Based on experience, we decided on the following with Nessie:
>
>    - Unknown fields/attributes in a structure _DO_ cause
>    (de)serialization failures.
>    - "Stable API versions" - endpoint additions and/or added query
>    parameters and/or enhanced structures do _NOT_ require a new API version
>    (as in the endpoint's route/path).
>    - "Flexible spec versions" - new and updated "capabilities" however
>    might cause a bump in the "spec version" that the server announces in its
>    `getConfig` result.
>
> Adding new routes/paths may require new endpoint implementations on the
> server side, which can easily lead to a lot of (unnecessarily boilerplate)
> code. Using different routes/paths is justified if the API is changed
> "fundamentally". We call the "path component" (api/v1/..., api/v2/...) API
> version - the server indicates the minimum and maximum supported API
> version, in case a client wants to "upgrade". I recommend to _not_ bump the
> API version in the route/path if it's not really necessary.
>
> Regarding the requirement to fail on unknown attributes: Unknown
> attributes may contain important information. A client may send a newer
> version of a request object with an important new field, but the (older)
> server discards the new attribute. Think of an attribute that for example
> defines a "commit condition" that the client expects to be respected. "New"
> attributes must be omittable (e.g. don't serialize if null/default) -
> clients indicate the "usage" of an added attribute using some request
> attribute (for example: "boolean returnExtendedInformation").
>
> The list of capabilities can be indicated with included "spec versions",
> to tell clients which features/functionalities a server
> supports."Production" spec versions could start with 1, and "reserve" 0 for
> experimental/unsupported/poc kind of implementation. It could look like
> this:
>   capabilities: [
>     "table-spec/2,3",   // but not table-spec v1 here
>     "view-spec/1",
>     "table-api/1",
>     "view-api/1",
>     "udf-api/1",
>     "super-feature/2,4,6",   // but not spec versions 0,1,3,5,7+
>     ...
>   ]
> Incrementing a spec version in the list of capabilities doesn't break any
> client. We could also define a structure to describe each capability:
>   components:
>     schemas:
>       Capability:
>         name:
>           type: string
>           description: Name of the capability
>         versions:
>           type: array:
>           description: List of supported spec versions of this capability.
> 0 means experimental (non-production) without any guarantees about the
> stability of schema for request and response parameters.
>           items:
>             type: integer
>             format: int32
>
> In Nessie, we ensure backwards and forwards compatibility using a
> specialized test suite that runs the "in tree" client against older server
> versions and older client versions against the "in tree" server version. It
> works fine for us for a few years now - and it did help preventing
> compatibility issues.
>
>
> On 26.06.24 07:44, Péter Váry wrote:
>
> Hi everyone,
>
> A few considerations:
> - I think we should explicitly state which client/service interoperability
> we are aiming for. I expect that we want to support both old client -> new
> server, and new client -> old server communications.
> - I agree with Jack, that we should think about versions in advance - HMS
> tried to be backwards compatible for everything, and that made it hard to
> move forward / deprecate things.
> - Still we should try to keep the backwards incompatible changes minimal.
> (All clients should be able to ignore unknown incoming fields / New
> optional input parameter should drive new features / Try to avoid enums in
> responses where we expect changes (?))
> - OTOH, it could be important for clients to know which of the backwards
> compatible changes are implemented for the given server - so I would
> decouple the URI from the versioning. Maybe major version change should
> (could) change the URI, but backwards compatible changes should be served
> on the same URI, but could be identified by different minor versions.
>
> This is exciting stuff!
> Thanks for pushing this forward!
>
> Peter
>
>
> On Wed, Jun 26, 2024, 00:15 Jack Ye <yezhao...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> I feel I do not see a good answer to why not just simply version each
>> API? When using tag, it means I have to offer capabilities per-tagged
>> group. However, I could for example just offer loadTable and nothing else
>> in a catalog, and that should still be Iceberg REST compliant. And I think
>> we need a versioning story anyway, there is no way around it.
>>
>> Here is the workflow in my mind with versioning:
>>
>> 1. Going forward, every time the REST catalog spec introduces any new API
>> endpoints or backwards incompatible changes to the existing APIs, the
>> version of the specific API is incremented. So suppose the PlanTable API is
>> added, this API will be at version v1. Suppose UpdateTable is updated with
>> a new update type, that API will be at version v2, but PlanTable will
>> remain at v1.
>>
>> 2. a catalog must implement getConfig. This API is the only one that is
>> required.
>>
>> 3. in getConfig, in the defaults map (it could be in some new metadata
>> structure, but since we want strong backwards compatibility guarantee,
>> reusing string maps seems to be the best way), server returns key-value
>> pairs of:
>> - key: operation:<operationName>
>> - value: version number
>>
>> 4. the client assumes that the map is ordered, and resolves API versions
>> sequentially. For example, suppose I have the following map:
>>
>> { "operation:planTable": "1", "operation:loadTable": "2" }
>>
>> Note that by "supporting", it means to return a response in a predictable
>> way that is compliant with the spec. It can also return 406
>> UnsupportedOperation as a way to support it.
>>
>> There is also a special version *, that means any version can work.
>>
>> 5. Backwards compatibility: suppose the client is at a higher version
>> than the server, then the client should always be able to understand the
>> server's full list of capabilities.
>>
>> 6. Forward compatibility: suppose the client is at a lower version than
>> the server, then the client should parse whatever operation it understands,
>> and use the highest version it could support to execute the operation.
>> Suppose the client only supports loadTable v1, then it will continue to hit
>> the GET v1/namespaces/{ns}/tables/{table} route, instead of GET
>> v2/namespaces/{ns}/tables/{table}. The v1 route could continue to support
>> the client, or it could throw 406 to indicate that this route is deprecated
>> and the client needs to upgrade.
>>
>> For initial backwards compatibility, I think not returning anything
>> should mean that all API that the client understands are having version *.
>>
>> What do people think of it, compared to the tag approach?
>>
>> Best,
>> Jack Ye
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jun 24, 2024 at 1:42 PM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>>> I don't have strong opinions either way here, just thought it was worth
>>> raising some concerns over possible evolution here.  Some responses inline,
>>> but if capabilities seem to meet the requirement at hand, then it does
>>> potentially seem the simplest mechanism.
>>>
>>>
>>> I think we also want to avoid relyance on server specific published
>>>> OpenAPI as they may leak other options/parameters/etc.  This may lead to
>>>> confusion around what the canonical spec is and make clients incompatible
>>>> if they're generated off of a non-standard spec document.
>>>
>>>
>>> Yeah, I wasn't proposing necessarily using built in functionality but a
>>> pre-scrubbed document.  Since there is no reference service implementation
>>> for REST it seems like each implementor would need to describe the best way
>>> of scrubbing there description.
>>>
>>>
>>>
>>>> @Micah this sounds to me as if the client would then have to parse a
>>>> bunch of endpoints to figure out whether it's safe to e.g. call loading a
>>>> view or dropping a table on the given REST server. Rather than having a
>>>> dedicated endpoint we're just using the */config* endpoint to provide
>>>> information about what a server supports.
>>>
>>>
>>> I was not suggesting multiple endpoints here, simply different contents
>>> for */config *I agree in the short term this does add complexity on the
>>> clients. But given that the canonical REST API clients are being developed
>>> into the standard library, I'm not sure how much toil this would cause in
>>> general. This also does not necessarily need to called up-front but could
>>> be called to verify existence vs a permission issue after an error was
>>> received.
>>>
>>> What round-trips did you have in mind here?
>>>
>>>
>>> All good points though, but I'm not aware of a standard way to handle
>>>> this.
>>>
>>>
>>> IIUC, this sounds like a standard service description problem to me, the
>>> solution with capabilities appears to be one level abstraction on top of
>>> this.  Service discovery seems like it has been reimplemented a few
>>> different times depending on the technology [1][2][3]
>>>
>>>
>>> I think versioning adds another level of complexity, but might be
>>>> necessary since I expect these will evolve to some extent and may even
>>>> require hitting versioned urls.
>>>
>>>
>>> If there is no concrete proposal on versioning, I agree it probably pays
>>> to side step this.  The endpoint transitioning from list of strings to list
>>> of objects, would be an obvious sign to clients that they are out of date.
>>> I think serving a service description(s), despite its complexity, is likely
>>> the most principled way of versioning items appropriately, but this
>>> definitely requires more in depth thought/design.
>>>
>>>
>>> Thanks,
>>> Micah
>>>
>>> [1] https://en.wikipedia.org/wiki/Web_Services_Description_Language
>>> [2] https://en.wikipedia.org/wiki/Web_Application_Description_Language
>>> [3] https://developers.google.com/discovery/v1/reference/apis
>>>
>>>
>>>
>>>
>>> On Mon, Jun 24, 2024 at 12:42 PM Daniel Weeks <dwe...@apache.org> wrote:
>>>
>>>> Hey Micah,
>>>>
>>>> I think what we're trying to achieve is strike a balance between client
>>>> complexity and ability to support multiple server-side capabilities.  One
>>>> challenge we've run into is if a client performs an operation (e.g.
>>>> listViews), but receives a 403 code, it's not clear whether the client
>>>> doesn't have access or the server doesn't support an endpoint but isn't
>>>> sending a 404 for security reasons.  This is a simple way for the client to
>>>> understand what it should expect from the server.
>>>>
>>>> >  Another option would be just list all endpoints . . . and let
>>>> clients take appropriate actions
>>>> > This could be done by vending the OpenAPI spec the server supports at
>>>> its own endpoint. I think this avoids the future problem of having to
>>>> classify new endpoints into a specific capability.
>>>>
>>>> You're right that this would be the most complete way to handle this,
>>>> but it's really complicated and may require additional "handshake" calls
>>>> even for small interactions with the catalog service.  I think this puts a
>>>> lot of onus on the client, when what we're describing is a set of endpoints
>>>> that correspond to a capability.
>>>>
>>>> I think we also want to avoid relyance on server specific published
>>>> OpenAPI as they may leak other options/parameters/etc.  This may lead to
>>>> confusion around what the canonical spec is and make clients incompatible
>>>> if they're generated off of a non-standard spec document.
>>>>
>>>> All good points though, but I'm not aware of a standard way to handle
>>>> this.
>>>>
>>>> I think versioning adds another level of complexity, but might be
>>>> necessary since I expect these will evolve to some extent and may even
>>>> require hitting versioned urls.
>>>>
>>>> -Dan
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 24, 2024 at 12:03 AM Eduard Tudenhöfner <
>>>> etudenhoef...@apache.org> wrote:
>>>>
>>>>> We had a separate discussion with Dan on the *oauth2* flag last week
>>>>> and came to the same conclusion that removing the *oauth2* capability
>>>>> is probably the best for now.
>>>>> This is mainly because we can't really act on the *oauth2* capability
>>>>> right now, because the */tokens* endpoint is called before we hit the
>>>>> */config* endpoint.
>>>>>
>>>>> > Another option would be just list all endpoints (and maybe even
>>>>> further which operations are supported) the server actually supports and
>>>>> let clients take appropriate actions (i.e. grouping could happen on the
>>>>> client side).  This could be done by vending the OpenAPI spec the server
>>>>> supports at its own endpoint. I think this avoids the future problem of
>>>>> having to classify new endpoints into a specific capability.
>>>>>
>>>>> @Micah this sounds to me as if the client would then have to parse a
>>>>> bunch of endpoints to figure out whether it's safe to e.g. call loading a
>>>>> view or dropping a table on the given REST server. Rather than having a
>>>>> dedicated endpoint we're just using the */config* endpoint to provide
>>>>> information about what a server supports.
>>>>>
>>>>> Thanks
>>>>> Eduard
>>>>>
>>>>> On Fri, Jun 21, 2024 at 8:27 PM Ryan Blue
>>>>> <b...@databricks.com.invalid> <b...@databricks.com.invalid> wrote:
>>>>>
>>>>>> Let's remove the oauth2 tag for now until we figure out how to move
>>>>>> forward there. That makes sense to me.
>>>>>>
>>>>>> On Fri, Jun 21, 2024 at 9:30 AM Dmitri Bourlatchkov
>>>>>> <dmitri.bourlatch...@dremio.com.invalid>
>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>
>>>>>>> Hi Eduard,
>>>>>>>
>>>>>>> The capabilities PR looks good to me overall. I have a concern with
>>>>>>> the "oauth2" tag name though.
>>>>>>>
>>>>>>> I also commented [1] in GH but the comment appears to be closed by
>>>>>>> default :)
>>>>>>>
>>>>>>> I believe the term "oauth2" is confusing in this context with
>>>>>>> respect to RFC 6749 [2] as discussed in depth on another thread [3]
>>>>>>>
>>>>>>> The functionality behind the /tokens endpoint is quite specific to
>>>>>>> the Iceberg REST spec and as the other discussion highlights, there are
>>>>>>> concerns with respect to OAuth2 interoperability with other OAuth2 
>>>>>>> servers.
>>>>>>>
>>>>>>> What do you think about using a different tag name for it, for
>>>>>>> example "local-tokens" or "auth-tokens"?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dmitri.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934
>>>>>>> [2] https://www.rfc-editor.org/rfc/rfc6749
>>>>>>> [3] https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50
>>>>>>>
>>>>>>> On Thu, Jun 20, 2024 at 7:28 AM Eduard Tudenhoefner <
>>>>>>> etudenhoef...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hey everyone,
>>>>>>>>
>>>>>>>> I'd like to bring up the discussion around describing REST server
>>>>>>>> capabilities via the */config* endpoint.
>>>>>>>> There is PR #9940 <https://github.com/apache/iceberg/pull/9940> that
>>>>>>>> describes the OpenAPI spec changes.
>>>>>>>>
>>>>>>>> Mainly we'd like to have a *capabilities* field in the
>>>>>>>> *ConfigResponse* that allows servers to indicate to clients which
>>>>>>>> capabilities are being supported.
>>>>>>>>
>>>>>>>> So far we have the following capabilities:
>>>>>>>>
>>>>>>>>    - tables
>>>>>>>>    - views
>>>>>>>>    - remote-signing
>>>>>>>>    - vended-credentials
>>>>>>>>    - multi-table-commit
>>>>>>>>    - register-table
>>>>>>>>    - table-metrics
>>>>>>>>    - oauth2
>>>>>>>>
>>>>>>>>
>>>>>>>> The general idea behind a capability is that if e.g. a server
>>>>>>>> supports *views*, then that server must implement all endpoints
>>>>>>>> grouped under that capability.
>>>>>>>> It's worth noting that the */config* endpoint is currently being
>>>>>>>> implicit (meaning that every REST server would have to implement it).
>>>>>>>>
>>>>>>>> One discussion point that came up during review is how we want to
>>>>>>>> handle capabilities and backwards compatibility and what the default
>>>>>>>> capability would be, since older servers don't know anything about
>>>>>>>> *capabilities* (in such a case we could assume that the default
>>>>>>>> capabilities would be *oauth2* / *tables*).
>>>>>>>>
>>>>>>>> Are there any other capabilities that we'd like to include in the
>>>>>>>> list?
>>>>>>>>
>>>>>>>> Eduard
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Databricks
>>>>>>
>>>>> --
> Robert Stupp
> @snazy
>
>

Re: [DISCUSS] Describing REST Server capabilities

Reply via email to