(I think, compatibility deserves a separate thread - it's a "huge" topic)
Based on experience, we decided on the following with Nessie:
* Unknown fields/attributes in a structure _DO_ cause
(de)serialization failures.
* "Stable API versions" - endpoint additions and/or added query
parameters and/or enhanced structures do _NOT_ require a new API
version (as in the endpoint's route/path).
* "Flexible spec versions" - new and updated "capabilities" however
might cause a bump in the "spec version" that the server announces
in its `getConfig` result.
Adding new routes/paths may require new endpoint implementations on the
server side, which can easily lead to a lot of (unnecessarily
boilerplate) code. Using different routes/paths is justified if the API
is changed "fundamentally". We call the "path component" (api/v1/...,
api/v2/...) API version - the server indicates the minimum and maximum
supported API version, in case a client wants to "upgrade". I recommend
to _not_ bump the API version in the route/path if it's not really
necessary.
Regarding the requirement to fail on unknown attributes: Unknown
attributes may contain important information. A client may send a newer
version of a request object with an important new field, but the (older)
server discards the new attribute. Think of an attribute that for
example defines a "commit condition" that the client expects to be
respected. "New" attributes must be omittable (e.g. don't serialize if
null/default) - clients indicate the "usage" of an added attribute using
some request attribute (for example: "boolean returnExtendedInformation").
The list of capabilities can be indicated with included "spec versions",
to tell clients which features/functionalities a server
supports."Production" spec versions could start with 1, and "reserve" 0
for experimental/unsupported/poc kind of implementation. It could look
like this:
capabilities: [
"table-spec/2,3", // but not table-spec v1 here
"view-spec/1",
"table-api/1",
"view-api/1",
"udf-api/1",
"super-feature/2,4,6", // but not spec versions 0,1,3,5,7+
...
]
Incrementing a spec version in the list of capabilities doesn't break
any client. We could also define a structure to describe each capability:
components:
schemas:
Capability:
name:
type: string
description: Name of the capability
versions:
type: array:
description: List of supported spec versions of this
capability. 0 means experimental (non-production) without any guarantees
about the stability of schema for request and response parameters.
items:
type: integer
format: int32
In Nessie, we ensure backwards and forwards compatibility using a
specialized test suite that runs the "in tree" client against older
server versions and older client versions against the "in tree" server
version. It works fine for us for a few years now - and it did help
preventing compatibility issues.
On 26.06.24 07:44, Péter Váry wrote:
Hi everyone,
A few considerations:
- I think we should explicitly state which client/service
interoperability we are aiming for. I expect that we want to support
both old client -> new server, and new client -> old server
communications.
- I agree with Jack, that we should think about versions in advance -
HMS tried to be backwards compatible for everything, and that made it
hard to move forward / deprecate things.
- Still we should try to keep the backwards incompatible changes
minimal. (All clients should be able to ignore unknown incoming fields
/ New optional input parameter should drive new features / Try to
avoid enums in responses where we expect changes (?))
- OTOH, it could be important for clients to know which of the
backwards compatible changes are implemented for the given server - so
I would decouple the URI from the versioning. Maybe major version
change should (could) change the URI, but backwards compatible changes
should be served on the same URI, but could be identified by different
minor versions.
This is exciting stuff!
Thanks for pushing this forward!
Peter
On Wed, Jun 26, 2024, 00:15 Jack Ye <yezhao...@gmail.com> wrote:
Hi everyone,
I feel I do not see a good answer to why not just simply version
each API? When using tag, it means I have to offer capabilities
per-tagged group. However, I could for example just offer
loadTable and nothing else in a catalog, and that should still be
Iceberg REST compliant. And I think we need a versioning story
anyway, there is no way around it.
Here is the workflow in my mind with versioning:
1. Going forward, every time the REST catalog spec introduces any
new API endpoints or backwards incompatible changes to the
existing APIs, the version of the specific API is incremented. So
suppose the PlanTable API is added, this API will be at version
v1. Suppose UpdateTable is updated with a new update type, that
API will be at version v2, but PlanTable will remain at v1.
2. a catalog must implement getConfig. This API is the only one
that is required.
3. in getConfig, in the defaults map (it could be in some new
metadata structure, but since we want strong backwards
compatibility guarantee, reusing string maps seems to be the best
way), server returns key-value pairs of:
- key: operation:<operationName>
- value: version number
4. the client assumes that the map is ordered, and resolves API
versions sequentially. For example, suppose I have the following map:
{ "operation:planTable": "1", "operation:loadTable": "2" }
Note that by "supporting", it means to return a response in a
predictable way that is compliant with the spec. It can also
return 406 UnsupportedOperation as a way to support it.
There is also a special version *, that means any version can work.
5. Backwards compatibility: suppose the client is at a higher
version than the server, then the client should always be able to
understand the server's full list of capabilities.
6. Forward compatibility: suppose the client is at a lower version
than the server, then the client should parse whatever operation
it understands, and use the highest version it could support to
execute the operation. Suppose the client only supports loadTable
v1, then it will continue to hit the GET
v1/namespaces/{ns}/tables/{table} route, instead of GET
v2/namespaces/{ns}/tables/{table}. The v1 route could continue to
support the client, or it could throw 406 to indicate that this
route is deprecated and the client needs to upgrade.
For initial backwards compatibility, I think not returning
anything should mean that all API that the client understands are
having version *.
What do people think of it, compared to the tag approach?
Best,
Jack Ye
On Mon, Jun 24, 2024 at 1:42 PM Micah Kornfield
<emkornfi...@gmail.com> wrote:
I don't have strong opinions either way here, just thought it
was worth raising some concerns over possible evolution here.
Some responses inline, but if capabilities seem to meet the
requirement at hand, then it does potentially seem the
simplest mechanism.
I think we also want to avoid relyance on server specific
published OpenAPI as they may leak other
options/parameters/etc. This may lead to confusion around
what the canonical spec is and make clients incompatible
if they're generated off of a non-standard spec document.
Yeah, I wasn't proposing necessarily using built in
functionality but a pre-scrubbed document. Since there is no
reference service implementation for REST it seems like each
implementor would need to describe the best way of scrubbing
there description.
@Micah this sounds to me as if the client would then have
to parse a bunch of endpoints to figure out whether it's
safe to e.g. call loading a view or dropping a table on
the given REST server. Rather than having a dedicated
endpoint we're just using the */config* endpoint to
provide information about what a server supports.
I was not suggesting multiple endpoints here, simply different
contents for */config *I agree in the short term this does
add complexity on the clients. But given that the canonical
REST API clients are being developed into the standard
library, I'm not sure how much toil this would cause in
general. This also does not necessarily need to called
up-front but could be called to verify existence vs a
permission issue after an error was received.
What round-trips did you have in mind here?
All good points though, but I'm not aware of a standard
way to handle this.
IIUC, this sounds like a standard service description problem
to me, the solution with capabilities appears to be one level
abstraction on top of this. Service discovery seems like it
has been reimplemented a few different times depending on the
technology [1][2][3]
I think versioning adds another level of complexity, but
might be necessary since I expect these will evolve to
some extent and may even require hitting versioned urls.
If there is no concrete proposal on versioning, I agree it
probably pays to side step this. The endpoint
transitioning from list of strings to list of objects, would
be an obvious sign to clients that they are out of date. I
think serving a service description(s), despite its
complexity, is likely the most principled way of versioning
items appropriately, but this definitely requires more in
depth thought/design.
Thanks,
Micah
[1]
https://en.wikipedia.org/wiki/Web_Services_Description_Language
[2]
https://en.wikipedia.org/wiki/Web_Application_Description_Language
[3] https://developers.google.com/discovery/v1/reference/apis
*
*
*
*
On Mon, Jun 24, 2024 at 12:42 PM Daniel Weeks
<dwe...@apache.org> wrote:
Hey Micah,
I think what we're trying to achieve is strike a balance
between client complexity and ability to support multiple
server-side capabilities. One challenge we've run into is
if a client performs an operation (e.g. listViews), but
receives a 403 code, it's not clear whether the client
doesn't have access or the server doesn't support an
endpoint but isn't sending a 404 for security reasons.
This is a simple way for the client to understand what it
should expect from the server.
> Another option would be just list all endpoints . . .
and let clients take appropriate actions
> This could be done by vending the OpenAPI spec the
server supports at its own endpoint. I think this avoids
the future problem of having to classify new endpoints
into a specific capability.
You're right that this would be the most complete way to
handle this, but it's really complicated and may require
additional "handshake" calls even for small interactions
with the catalog service. I think this puts a lot of onus
on the client, when what we're describing is a set of
endpoints that correspond to a capability.
I think we also want to avoid relyance on server specific
published OpenAPI as they may leak other
options/parameters/etc. This may lead to confusion around
what the canonical spec is and make clients incompatible
if they're generated off of a non-standard spec document.
All good points though, but I'm not aware of a standard
way to handle this.
I think versioning adds another level of complexity, but
might be necessary since I expect these will evolve to
some extent and may even require hitting versioned urls.
-Dan
On Mon, Jun 24, 2024 at 12:03 AM Eduard Tudenhöfner
<etudenhoef...@apache.org> wrote:
We had a separate discussion with Dan on the *oauth2*
flag last week and came to the same conclusion that
removing the *oauth2* capability is probably the best
for now.
This is mainly because we can't really act on the
*oauth2* capability right now, because the */tokens*
endpoint is called before we hit the */config* endpoint.
> Another option would be just list all endpoints (and
maybe even further which operations are supported) the
server actually supports and let clients take
appropriate actions (i.e. grouping could happen on the
client side). This could be done by vending the
OpenAPI spec the server supports at its own endpoint.
I think this avoids the future problem of having to
classify new endpoints into a specific capability.
@Micah this sounds to me as if the client would then
have to parse a bunch of endpoints to figure out
whether it's safe to e.g. call loading a view or
dropping a table on the given REST server. Rather than
having a dedicated endpoint we're just using the
*/config* endpoint to provide information about what a
server supports.
Thanks
Eduard
On Fri, Jun 21, 2024 at 8:27 PM Ryan Blue
<b...@databricks.com.invalid> wrote:
Let's remove the oauth2 tag for now until we
figure out how to move forward there. That makes
sense to me.
On Fri, Jun 21, 2024 at 9:30 AM Dmitri
Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid> wrote:
Hi Eduard,
The capabilities PR looks good to me overall.
I have a concern with the "oauth2" tag name
though.
I also commented [1] in GH but the comment
appears to be closed by default :)
I believe the term "oauth2" is confusing in
this context with respect to RFC 6749 [2] as
discussed in depth on another thread [3]
The functionality behind the /tokens endpoint
is quite specific to the Iceberg REST spec and
as the other discussion highlights, there are
concerns with respect to OAuth2
interoperability with other OAuth2 servers.
What do you think about using a different tag
name for it, for example "local-tokens" or
"auth-tokens"?
Thanks,
Dmitri.
[1]
https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934
[2] https://www.rfc-editor.org/rfc/rfc6749
[3]
https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50
On Thu, Jun 20, 2024 at 7:28 AM Eduard
Tudenhoefner <etudenhoef...@apache.org> wrote:
Hey everyone,
I'd like to bring up the discussion around
describing REST server capabilities via
the */config* endpoint.
There is PR #9940
<https://github.com/apache/iceberg/pull/9940> that
describes the OpenAPI spec changes.
Mainly we'd like to have a
*capabilities* field in the
*ConfigResponse* that allows servers to
indicate to clients which capabilities are
being supported.
So far we have the following capabilities:
* tables
* views
* remote-signing
* vended-credentials
* multi-table-commit
* register-table
* table-metrics
* oauth2
The general idea behind a capability is
that if e.g. a server supports *views*,
then that server must implement all
endpoints grouped under that capability.
It's worth noting that the
*/config* endpoint is currently being
implicit (meaning that every REST server
would have to implement it).
One discussion point that came up during
review is how we want to handle
capabilities and backwards compatibility
and what the default capability would be,
since older servers don't know anything
about *capabilities* (in such a case we
could assume that the default capabilities
would be *oauth2* / *tables*).
Are there any other capabilities that we'd
like to include in the list?
Eduard
--
Ryan Blue
Databricks
--
Robert Stupp
@snazy