I feel Alex is already tapping into the more complex
territory I do not want to go into, because as he says, a
"capability" is logical, and it can be a set of
overlapping endpoints, small features in some endpoints,
etc. We already saw that in the original PR we tried to
say "pagination" is a capability, but it is really just a
very small feature of an endpoint, and it might also
evolve on its own to extend to other endpoints in the
future, and maybe one endpoint supports it and one does
not...
My fear is that we are getting into the business of
defining things that are totally unnecessary. It takes
our energy to define that and debate the boundary of each
capability, but in the end what does it buy us? As Eduard
says, you still need to have documentation to explain
what "partial capabilities" are supported for a catalog,
and people are supposed to read the documentation to not
do the unsupported things.
In the end, the client-server needs to understand exactly
(1) what endpoints can be invoked, and (2) using what
request-response schema, that is the key to me. If it
means returning a response with 20-30ish hard-coded
entries, and the client is configured based on that, that
seems totally reasonable to me.
-Jack
On Thu, Jun 27, 2024 at 7:58 AM Alex Dutra
<alex.du...@dremio.com.invalid>
<mailto:alex.du...@dremio.com.invalid> wrote:
Hi all,
So far we've been thinking of capabilities as
equivalent to a set of endpoints.
That's a rather technical definition. It also brings
one important limitation: one endpoint can only be
"governed" by one capability.
Granted, most capabilities do require implementing
specific endpoints. But I wonder if, for the sake of
being future-proof, we shouldn't broaden the meaning
of that term to embrace /logical/ or /behavioral/
concepts as well.
One example that comes to mind: a REST catalog
implementor may choose to implement the
transactions-commit endpoint to fully comply with the
"tables" capability; but for performance reasons, or
simply because it's too complex, they could opt for
rejecting multi-table commits (iow, if a
CommitTransactionRequest contains one
single CommitTableRequest, that's fine, otherwise,
the endpoint would return an error). It would be nice
to express that as a capability: this way the client
knows that it is safe to call the transactions-commit
endpoint, but with one CommitTableRequest at a time.
Such a capability would not be defined by a specific
endpoint, but rather, would influence the behavior
exhibited by certain endpoints.
Thanks,
Alex
On Thu, Jun 27, 2024 at 11:34 AM Jean-Baptiste Onofré
<j...@nanthrax.net> wrote:
Hi Jack
I like Robert's proposal. Back to the topics, I
think grouping with
tags is more "flexible" (it was what we included
in the REST spec
proposal as well).
Regards
JB
On Wed, Jun 26, 2024 at 6:26 PM Jack Ye
<yezhao...@gmail.com> wrote:
>
> It seems like there are 2 sub-topics here:
> 1. should we group operations with tags, or
should we do this per-operation/endpoint?
> 2. how should we do the capability/versioning
for each unit (either per tag or per operation)
>
> Shall we first conclude on 1?
>
> For 1, my take is that we will need to do it
per operation, for 2 reasons:
>
> (1) There are many REST services that would
only implement a very small set of APIs, such as
just loadTable and loadView. Some will choose to
not implement very specific endpoints, such as
renameTable. Tags seems convenient but it is
mandating people to implement a specific group of
APIs together, which is a lot of burdens for
especially small organizations, if they just want
to support very specific goals like reading
through IRC.
>
> (2) Suppose a new tag is added in the future,
the server returns that tag, but an older client
does not understand it, it might cause mistakes
in the client's understanding of what is
supported and what is not, when a tag contains
both features in existing APIs and also new APIs.
If we define that tags do not overlap with each
other, this is probably not a concern. However,
(1) still is a problem from a usability perspective.
>
> Best,
> Jack Ye
>
>
>
>
> On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks
<dwe...@apache.org> wrote:
>>
>> I think Robert's approach is a reasonable
compromise here.
>>
>> If we wanted a "per operation/endpoint"
versioning, I think I'd prefer Micah's OpenAPI
spec based approach because it's more
standardized, but I feel adds a lot of client
complexity.
>>
>> -Dan
>>
>>
>>
>> On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp
<sn...@snazy.de> wrote:
>>>
>>> (I think, compatibility deserves a separate
thread - it's a "huge" topic)
>>>
>>> Based on experience, we decided on the
following with Nessie:
>>>
>>> Unknown fields/attributes in a structure _DO_
cause (de)serialization failures.
>>> "Stable API versions" - endpoint additions
and/or added query parameters and/or enhanced
structures do _NOT_ require a new API version (as
in the endpoint's route/path).
>>> "Flexible spec versions" - new and updated
"capabilities" however might cause a bump in the
"spec version" that the server announces in its
`getConfig` result.
>>>
>>> Adding new routes/paths may require new
endpoint implementations on the server side,
which can easily lead to a lot of (unnecessarily
boilerplate) code. Using different routes/paths
is justified if the API is changed
"fundamentally". We call the "path component"
(api/v1/..., api/v2/...) API version - the server
indicates the minimum and maximum supported API
version, in case a client wants to "upgrade". I
recommend to _not_ bump the API version in the
route/path if it's not really necessary.
>>>
>>> Regarding the requirement to fail on unknown
attributes: Unknown attributes may contain
important information. A client may send a newer
version of a request object with an important new
field, but the (older) server discards the new
attribute. Think of an attribute that for example
defines a "commit condition" that the client
expects to be respected. "New" attributes must be
omittable (e.g. don't serialize if null/default)
- clients indicate the "usage" of an added
attribute using some request attribute (for
example: "boolean returnExtendedInformation").
>>>
>>> The list of capabilities can be indicated
with included "spec versions", to tell clients
which features/functionalities a server
supports."Production" spec versions could start
with 1, and "reserve" 0 for
experimental/unsupported/poc kind of
implementation. It could look like this:
>>> capabilities: [
>>> "table-spec/2,3", // but not table-spec v1
here
>>> "view-spec/1",
>>> "table-api/1",
>>> "view-api/1",
>>> "udf-api/1",
>>> "super-feature/2,4,6", // but not spec
versions 0,1,3,5,7+
>>> ...
>>> ]
>>> Incrementing a spec version in the list of
capabilities doesn't break any client. We could
also define a structure to describe each capability:
>>> components:
>>> schemas:
>>> Capability:
>>> name:
>>> type: string
>>> description: Name of the capability
>>> versions:
>>> type: array:
>>> description: List of supported spec versions
of this capability. 0 means experimental
(non-production) without any guarantees about the
stability of schema for request and response
parameters.
>>> items:
>>> type: integer
>>> format: int32
>>>
>>> In Nessie, we ensure backwards and forwards
compatibility using a specialized test suite that
runs the "in tree" client against older server
versions and older client versions against the
"in tree" server version. It works fine for us
for a few years now - and it did help preventing
compatibility issues.
>>>
>>>
>>> On 26.06.24 07:44, Péter Váry wrote:
>>>
>>> Hi everyone,
>>>
>>> A few considerations:
>>> - I think we should explicitly state which
client/service interoperability we are aiming
for. I expect that we want to support both old
client -> new server, and new client -> old
server communications.
>>> - I agree with Jack, that we should think
about versions in advance - HMS tried to be
backwards compatible for everything, and that
made it hard to move forward / deprecate things.
>>> - Still we should try to keep the backwards
incompatible changes minimal. (All clients should
be able to ignore unknown incoming fields / New
optional input parameter should drive new
features / Try to avoid enums in responses where
we expect changes (?))
>>> - OTOH, it could be important for clients to
know which of the backwards compatible changes
are implemented for the given server - so I would
decouple the URI from the versioning. Maybe major
version change should (could) change the URI, but
backwards compatible changes should be served on
the same URI, but could be identified by
different minor versions.
>>>
>>> This is exciting stuff!
>>> Thanks for pushing this forward!
>>>
>>> Peter
>>>
>>>
>>> On Wed, Jun 26, 2024, 00:15 Jack Ye
<yezhao...@gmail.com> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I feel I do not see a good answer to why not
just simply version each API? When using tag, it
means I have to offer capabilities per-tagged
group. However, I could for example just offer
loadTable and nothing else in a catalog, and that
should still be Iceberg REST compliant. And I
think we need a versioning story anyway, there is
no way around it.
>>>>
>>>> Here is the workflow in my mind with versioning:
>>>>
>>>> 1. Going forward, every time the REST
catalog spec introduces any new API endpoints or
backwards incompatible changes to the existing
APIs, the version of the specific API is
incremented. So suppose the PlanTable API is
added, this API will be at version v1. Suppose
UpdateTable is updated with a new update type,
that API will be at version v2, but PlanTable
will remain at v1.
>>>>
>>>> 2. a catalog must implement getConfig. This
API is the only one that is required.
>>>>
>>>> 3. in getConfig, in the defaults map (it
could be in some new metadata structure, but
since we want strong backwards compatibility
guarantee, reusing string maps seems to be the
best way), server returns key-value pairs of:
>>>> - key: operation:<operationName>
>>>> - value: version number
>>>>
>>>> 4. the client assumes that the map is
ordered, and resolves API versions sequentially.
For example, suppose I have the following map:
>>>>
>>>> { "operation:planTable": "1",
"operation:loadTable": "2" }
>>>>
>>>> Note that by "supporting", it means to
return a response in a predictable way that is
compliant with the spec. It can also return 406
UnsupportedOperation as a way to support it.
>>>>
>>>> There is also a special version *, that
means any version can work.
>>>>
>>>> 5. Backwards compatibility: suppose the
client is at a higher version than the server,
then the client should always be able to
understand the server's full list of capabilities.
>>>>
>>>> 6. Forward compatibility: suppose the client
is at a lower version than the server, then the
client should parse whatever operation it
understands, and use the highest version it could
support to execute the operation. Suppose the
client only supports loadTable v1, then it will
continue to hit the GET
v1/namespaces/{ns}/tables/{table} route, instead
of GET v2/namespaces/{ns}/tables/{table}. The v1
route could continue to support the client, or it
could throw 406 to indicate that this route is
deprecated and the client needs to upgrade.
>>>>
>>>> For initial backwards compatibility, I think
not returning anything should mean that all API
that the client understands are having version *.
>>>>
>>>> What do people think of it, compared to the
tag approach?
>>>>
>>>> Best,
>>>> Jack Ye
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 24, 2024 at 1:42 PM Micah
Kornfield <emkornfi...@gmail.com> wrote:
>>>>>
>>>>> I don't have strong opinions either way
here, just thought it was worth raising some
concerns over possible evolution here. Some
responses inline, but if capabilities seem to
meet the requirement at hand, then it does
potentially seem the simplest mechanism.
>>>>>
>>>>>
>>>>>> I think we also want to avoid relyance on
server specific published OpenAPI as they may
leak other options/parameters/etc. This may lead
to confusion around what the canonical spec is
and make clients incompatible if they're
generated off of a non-standard spec document.
>>>>>
>>>>>
>>>>> Yeah, I wasn't proposing necessarily using
built in functionality but a pre-scrubbed
document. Since there is no reference service
implementation for REST it seems like each
implementor would need to describe the best way
of scrubbing there description.
>>>>>
>>>>>
>>>>>>
>>>>>> @Micah this sounds to me as if the client
would then have to parse a bunch of endpoints to
figure out whether it's safe to e.g. call loading
a view or dropping a table on the given REST
server. Rather than having a dedicated endpoint
we're just using the /config endpoint to provide
information about what a server supports.
>>>>>
>>>>>
>>>>> I was not suggesting multiple endpoints
here, simply different contents for /config I
agree in the short term this does add complexity
on the clients. But given that the canonical REST
API clients are being developed into the standard
library, I'm not sure how much toil this would
cause in general. This also does not necessarily
need to called up-front but could be called to
verify existence vs a permission issue after an
error was received.
>>>>>
>>>>> What round-trips did you have in mind here?
>>>>>
>>>>>
>>>>>> All good points though, but I'm not aware
of a standard way to handle this.
>>>>>
>>>>>
>>>>> IIUC, this sounds like a standard service
description problem to me, the solution with
capabilities appears to be one level abstraction
on top of this. Service discovery seems like it
has been reimplemented a few different times
depending on the technology [1][2][3]
>>>>>
>>>>>
>>>>>> I think versioning adds another level of
complexity, but might be necessary since I expect
these will evolve to some extent and may even
require hitting versioned urls.
>>>>>
>>>>>
>>>>> If there is no concrete proposal on
versioning, I agree it probably pays to side step
this. The endpoint transitioning from list of
strings to list of objects, would be an obvious
sign to clients that they are out of date. I
think serving a service description(s), despite
its complexity, is likely the most principled way
of versioning items appropriately, but this
definitely requires more in depth thought/design.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Micah
>>>>>
>>>>> [1]
https://en.wikipedia.org/wiki/Web_Services_Description_Language
>>>>> [2]
https://en.wikipedia.org/wiki/Web_Application_Description_Language
>>>>> [3]
https://developers.google.com/discovery/v1/reference/apis
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 24, 2024 at 12:42 PM Daniel
Weeks <dwe...@apache.org> wrote:
>>>>>>
>>>>>> Hey Micah,
>>>>>>
>>>>>> I think what we're trying to achieve is
strike a balance between client complexity and
ability to support multiple server-side
capabilities. One challenge we've run into is if
a client performs an operation (e.g. listViews),
but receives a 403 code, it's not clear whether
the client doesn't have access or the server
doesn't support an endpoint but isn't sending a
404 for security reasons. This is a simple way
for the client to understand what it should
expect from the server.
>>>>>>
>>>>>> > Another option would be just list all
endpoints . . . and let clients take appropriate
actions
>>>>>> > This could be done by vending the
OpenAPI spec the server supports at its own
endpoint. I think this avoids the future problem
of having to classify new endpoints into a
specific capability.
>>>>>>
>>>>>> You're right that this would be the most
complete way to handle this, but it's really
complicated and may require additional
"handshake" calls even for small interactions
with the catalog service. I think this puts a
lot of onus on the client, when what we're
describing is a set of endpoints that correspond
to a capability.
>>>>>>
>>>>>> I think we also want to avoid relyance on
server specific published OpenAPI as they may
leak other options/parameters/etc. This may lead
to confusion around what the canonical spec is
and make clients incompatible if they're
generated off of a non-standard spec document.
>>>>>>
>>>>>> All good points though, but I'm not aware
of a standard way to handle this.
>>>>>>
>>>>>> I think versioning adds another level of
complexity, but might be necessary since I expect
these will evolve to some extent and may even
require hitting versioned urls.
>>>>>>
>>>>>> -Dan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 24, 2024 at 12:03 AM Eduard
Tudenhöfner <etudenhoef...@apache.org> wrote:
>>>>>>>
>>>>>>> We had a separate discussion with Dan on
the oauth2 flag last week and came to the same
conclusion that removing the oauth2 capability is
probably the best for now.
>>>>>>> This is mainly because we can't really
act on the oauth2 capability right now, because
the /tokens endpoint is called before we hit the
/config endpoint.
>>>>>>>
>>>>>>> > Another option would be just list all
endpoints (and maybe even further which
operations are supported) the server actually
supports and let clients take appropriate actions
(i.e. grouping could happen on the client side).
This could be done by vending the OpenAPI spec
the server supports at its own endpoint. I think
this avoids the future problem of having to
classify new endpoints into a specific capability.
>>>>>>>
>>>>>>> @Micah this sounds to me as if the client
would then have to parse a bunch of endpoints to
figure out whether it's safe to e.g. call loading
a view or dropping a table on the given REST
server. Rather than having a dedicated endpoint
we're just using the /config endpoint to provide
information about what a server supports.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Eduard
>>>>>>>
>>>>>>> On Fri, Jun 21, 2024 at 8:27 PM Ryan Blue
<b...@databricks.com.invalid>
<mailto:b...@databricks.com.invalid> wrote:
>>>>>>>>
>>>>>>>> Let's remove the oauth2 tag for now
until we figure out how to move forward there.
That makes sense to me.
>>>>>>>>
>>>>>>>> On Fri, Jun 21, 2024 at 9:30 AM Dmitri
Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid>
<mailto:dmitri.bourlatch...@dremio.com.invalid>
wrote:
>>>>>>>>>
>>>>>>>>> Hi Eduard,
>>>>>>>>>
>>>>>>>>> The capabilities PR looks good to me
overall. I have a concern with the "oauth2" tag
name though.
>>>>>>>>>
>>>>>>>>> I also commented [1] in GH but the
comment appears to be closed by default :)
>>>>>>>>>
>>>>>>>>> I believe the term "oauth2" is
confusing in this context with respect to RFC
6749 [2] as discussed in depth on another thread [3]
>>>>>>>>>
>>>>>>>>> The functionality behind the /tokens
endpoint is quite specific to the Iceberg REST
spec and as the other discussion highlights,
there are concerns with respect to OAuth2
interoperability with other OAuth2 servers.
>>>>>>>>>
>>>>>>>>> What do you think about using a
different tag name for it, for example
"local-tokens" or "auth-tokens"?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Dmitri.
>>>>>>>>>
>>>>>>>>> [1]
https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934
>>>>>>>>> [2] https://www.rfc-editor.org/rfc/rfc6749
>>>>>>>>> [3]
https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50
>>>>>>>>>
>>>>>>>>> On Thu, Jun 20, 2024 at 7:28 AM Eduard
Tudenhoefner <etudenhoef...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>> Hey everyone,
>>>>>>>>>>
>>>>>>>>>> I'd like to bring up the discussion
around describing REST server capabilities via
the /config endpoint.
>>>>>>>>>> There is PR #9940 that describes the
OpenAPI spec changes.
>>>>>>>>>>
>>>>>>>>>> Mainly we'd like to have a
capabilities field in the ConfigResponse that
allows servers to indicate to clients which
capabilities are being supported.
>>>>>>>>>>
>>>>>>>>>> So far we have the following capabilities:
>>>>>>>>>>
>>>>>>>>>> tables
>>>>>>>>>> views
>>>>>>>>>> remote-signing
>>>>>>>>>> vended-credentials
>>>>>>>>>> multi-table-commit
>>>>>>>>>> register-table
>>>>>>>>>> table-metrics
>>>>>>>>>> oauth2
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The general idea behind a capability
is that if e.g. a server supports views, then
that server must implement all endpoints grouped
under that capability.
>>>>>>>>>> It's worth noting that the /config
endpoint is currently being implicit (meaning
that every REST server would have to implement it).
>>>>>>>>>>
>>>>>>>>>> One discussion point that came up
during review is how we want to handle
capabilities and backwards compatibility and what
the default capability would be, since older
servers don't know anything about capabilities
(in such a case we could assume that the default
capabilities would be oauth2 / tables).
>>>>>>>>>>
>>>>>>>>>> Are there any other capabilities that
we'd like to include in the list?
>>>>>>>>>>
>>>>>>>>>> Eduard
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ryan Blue
>>>>>>>> Databricks
>>>
>>> --
>>> Robert Stupp
>>> @snazy