> evaluate a REST service and if it's good for their use case

Feels like you are talking from the perspective of people choosing a vendor
product. I believe most vendors will offer near-full capability. But I am
coming from an angle of small organizations that are building REST servers
just for opening connectivity of internal systems to open engines, and I
know there are many doing that right now. And all they need is technically
just a handful of APIs, many of them will be read-only, very limited in
auxiliary features.

And I believe this is also the current state of Databricks Unity, which
only implements the following:
- getConfig
- listNamespaces
- loadNamespaceMetadata
- listTables
- loadTable
- tableExists
- emitMetrics

Code ref:
https://github.com/unitycatalog/unitycatalog/blob/main/server/src/main/java/io/unitycatalog/server/service/IcebergRestCatalogService.java

How do I express the capability of that?

Maybe with your team joining Databricks, you will add full capabilities,
but I think it is totally okay in its current state, it serves the main use
cases very well.

And feels like we are optimizing people writing 5 capabilities vs 20
capabilities, and we need to spend a lot of effort in debating how to group
capabilities going forward, which I'd rather spend the energy doing other
things...

-Jack






On Wed, Jun 26, 2024 at 1:41 PM Amogh Jahagirdar <amo...@apache.org> wrote:

> I'm in favor of grouping by tags. The way I look at this, there are 2
> primary considerations:
>
> 1.) The client/server protocol complexity tradeoffs. On the first
> consideration, unless I'm missing something the client side becomes
> significantly more complex; if this has been sketched out earlier in this
> thread just point me to it. Grouping by tag seems more easy to manage from
> a client side but happy to be proven wrong here if that's not the case,
> this is just going off whats in my head at the moment.
>
> 2.) What is more useful for end users of Iceberg to evaluate the
> capabilities of a REST Server?
> In the end as REST is becoming more adopted, I think it's more healthy for
> the ecosystem
> if the community can define groupings which end users can more easily
> understand when trying to understand what a REST implementation can
> actually do. To me defining "X operation/endpoint Version Y" for all
> operations makes it needlessly more difficult for users to evaluate a REST
> service and if it's good for their use case. Standardizing the grouping by
> a simple tag name has a clear benefit to end users imo.
>
> > (1) There are many REST services that would only implement a very small
> set
> > of APIs, such as just loadTable and loadView. Some will choose to not
> > implement very specific endpoints, such as renameTable. Tags seems
> > convenient but it is mandating people to implement a specific group of
> APIs
> > together, which is a lot of burdens for especially small organizations,
> if
> > they just want to support very specific goals like reading through IRC.
>
> I can understand the concern here but going back to my earlier point, the
> capability tagging really should be beneficial for end users which I think
> optimizing for that is more important.  It's true that for a REST server to
> be considered "capability X compliant" it needs to implement all the
> endpoints for X which does have a burden on a server implementation, but I
> think that's net better for the ecosystem since a broader set of users have
> a clear idea of what's supported and can make good decisions for themselves
> since everyone is speaking the same standard language.
>
> Furthermore, we could also look at if it makes sense to make the tags more
> granular if the scenario described is actually common.
>
> On 2024/06/26 16:26:29 Jack Ye wrote:
> > It seems like there are 2 sub-topics here:
> > 1. should we group operations with tags, or should we do this
> > per-operation/endpoint?
> > 2. how should we do the capability/versioning for each unit (either per
> tag
> > or per operation)
> >
> > Shall we first conclude on 1?
> >
> > For 1, my take is that we will need to do it per operation, for 2
> reasons:
> >
> > (1) There are many REST services that would only implement a very small
> set
> > of APIs, such as just loadTable and loadView. Some will choose to not
> > implement very specific endpoints, such as renameTable. Tags seems
> > convenient but it is mandating people to implement a specific group of
> APIs
> > together, which is a lot of burdens for especially small organizations,
> if
> > they just want to support very specific goals like reading through IRC.
> >
> > (2) Suppose a new tag is added in the future, the server returns that
> tag,
> > but an older client does not understand it, it might cause mistakes in
> the
> > client's understanding of what is supported and what is not, when a tag
> > contains both features in existing APIs and also new APIs. If we define
> > that tags do not overlap with each other, this is probably not a concern.
> > However, (1) still is a problem from a usability perspective.
> >
> > Best,
> > Jack Ye
> >
> >
> >
> >
> > On Wed, Jun 26, 2024 at 9:02 AM Daniel Weeks <dwe...@apache.org> wrote:
> >
> > > I think Robert's approach is a reasonable compromise here.
> > >
> > > If we wanted a "per operation/endpoint" versioning, I think I'd prefer
> > > Micah's OpenAPI spec based approach because it's more standardized,
> but I
> > > feel adds a lot of client complexity.
> > >
> > > -Dan
> > >
> > >
> > >
> > > On Wed, Jun 26, 2024 at 6:59 AM Robert Stupp <sn...@snazy.de> wrote:
> > >
> > >> (I think, compatibility deserves a separate thread - it's a "huge"
> topic)
> > >>
> > >> Based on experience, we decided on the following with Nessie:
> > >>
> > >>    - Unknown fields/attributes in a structure _DO_ cause
> > >>    (de)serialization failures.
> > >>    - "Stable API versions" - endpoint additions and/or added query
> > >>    parameters and/or enhanced structures do _NOT_ require a new API
> version
> > >>    (as in the endpoint's route/path).
> > >>    - "Flexible spec versions" - new and updated "capabilities" however
> > >>    might cause a bump in the "spec version" that the server announces
> in its
> > >>    `getConfig` result.
> > >>
> > >> Adding new routes/paths may require new endpoint implementations on
> the
> > >> server side, which can easily lead to a lot of (unnecessarily
> boilerplate)
> > >> code. Using different routes/paths is justified if the API is changed
> > >> "fundamentally". We call the "path component" (api/v1/...,
> api/v2/...) API
> > >> version - the server indicates the minimum and maximum supported API
> > >> version, in case a client wants to "upgrade". I recommend to _not_
> bump the
> > >> API version in the route/path if it's not really necessary.
> > >>
> > >> Regarding the requirement to fail on unknown attributes: Unknown
> > >> attributes may contain important information. A client may send a
> newer
> > >> version of a request object with an important new field, but the
> (older)
> > >> server discards the new attribute. Think of an attribute that for
> example
> > >> defines a "commit condition" that the client expects to be respected.
> "New"
> > >> attributes must be omittable (e.g. don't serialize if null/default) -
> > >> clients indicate the "usage" of an added attribute using some request
> > >> attribute (for example: "boolean returnExtendedInformation").
> > >>
> > >> The list of capabilities can be indicated with included "spec
> versions",
> > >> to tell clients which features/functionalities a server
> > >> supports."Production" spec versions could start with 1, and "reserve"
> 0 for
> > >> experimental/unsupported/poc kind of implementation. It could look
> like
> > >> this:
> > >>   capabilities: [
> > >>     "table-spec/2,3",   // but not table-spec v1 here
> > >>     "view-spec/1",
> > >>     "table-api/1",
> > >>     "view-api/1",
> > >>     "udf-api/1",
> > >>     "super-feature/2,4,6",   // but not spec versions 0,1,3,5,7+
> > >>     ...
> > >>   ]
> > >> Incrementing a spec version in the list of capabilities doesn't break
> any
> > >> client. We could also define a structure to describe each capability:
> > >>   components:
> > >>     schemas:
> > >>       Capability:
> > >>         name:
> > >>           type: string
> > >>           description: Name of the capability
> > >>         versions:
> > >>           type: array:
> > >>           description: List of supported spec versions of this
> > >> capability. 0 means experimental (non-production) without any
> guarantees
> > >> about the stability of schema for request and response parameters.
> > >>           items:
> > >>             type: integer
> > >>             format: int32
> > >>
> > >> In Nessie, we ensure backwards and forwards compatibility using a
> > >> specialized test suite that runs the "in tree" client against older
> server
> > >> versions and older client versions against the "in tree" server
> version. It
> > >> works fine for us for a few years now - and it did help preventing
> > >> compatibility issues.
> > >>
> > >>
> > >> On 26.06.24 07:44, Péter Váry wrote:
> > >>
> > >> Hi everyone,
> > >>
> > >> A few considerations:
> > >> - I think we should explicitly state which client/service
> > >> interoperability we are aiming for. I expect that we want to support
> both
> > >> old client -> new server, and new client -> old server communications.
> > >> - I agree with Jack, that we should think about versions in advance -
> HMS
> > >> tried to be backwards compatible for everything, and that made it
> hard to
> > >> move forward / deprecate things.
> > >> - Still we should try to keep the backwards incompatible changes
> minimal.
> > >> (All clients should be able to ignore unknown incoming fields / New
> > >> optional input parameter should drive new features / Try to avoid
> enums in
> > >> responses where we expect changes (?))
> > >> - OTOH, it could be important for clients to know which of the
> backwards
> > >> compatible changes are implemented for the given server - so I would
> > >> decouple the URI from the versioning. Maybe major version change
> should
> > >> (could) change the URI, but backwards compatible changes should be
> served
> > >> on the same URI, but could be identified by different minor versions.
> > >>
> > >> This is exciting stuff!
> > >> Thanks for pushing this forward!
> > >>
> > >> Peter
> > >>
> > >>
> > >> On Wed, Jun 26, 2024, 00:15 Jack Ye <yezhao...@gmail.com> wrote:
> > >>
> > >>> Hi everyone,
> > >>>
> > >>> I feel I do not see a good answer to why not just simply version each
> > >>> API? When using tag, it means I have to offer capabilities per-tagged
> > >>> group. However, I could for example just offer loadTable and nothing
> else
> > >>> in a catalog, and that should still be Iceberg REST compliant. And I
> think
> > >>> we need a versioning story anyway, there is no way around it.
> > >>>
> > >>> Here is the workflow in my mind with versioning:
> > >>>
> > >>> 1. Going forward, every time the REST catalog spec introduces any new
> > >>> API endpoints or backwards incompatible changes to the existing
> APIs, the
> > >>> version of the specific API is incremented. So suppose the PlanTable
> API is
> > >>> added, this API will be at version v1. Suppose UpdateTable is
> updated with
> > >>> a new update type, that API will be at version v2, but PlanTable will
> > >>> remain at v1.
> > >>>
> > >>> 2. a catalog must implement getConfig. This API is the only one that
> is
> > >>> required.
> > >>>
> > >>> 3. in getConfig, in the defaults map (it could be in some new
> metadata
> > >>> structure, but since we want strong backwards compatibility
> guarantee,
> > >>> reusing string maps seems to be the best way), server returns
> key-value
> > >>> pairs of:
> > >>> - key: operation:<operationName>
> > >>> - value: version number
> > >>>
> > >>> 4. the client assumes that the map is ordered, and resolves API
> versions
> > >>> sequentially. For example, suppose I have the following map:
> > >>>
> > >>> { "operation:planTable": "1", "operation:loadTable": "2" }
> > >>>
> > >>> Note that by "supporting", it means to return a response in a
> > >>> predictable way that is compliant with the spec. It can also return
> 406
> > >>> UnsupportedOperation as a way to support it.
> > >>>
> > >>> There is also a special version *, that means any version can work.
> > >>>
> > >>> 5. Backwards compatibility: suppose the client is at a higher version
> > >>> than the server, then the client should always be able to understand
> the
> > >>> server's full list of capabilities.
> > >>>
> > >>> 6. Forward compatibility: suppose the client is at a lower version
> than
> > >>> the server, then the client should parse whatever operation it
> understands,
> > >>> and use the highest version it could support to execute the
> operation.
> > >>> Suppose the client only supports loadTable v1, then it will continue
> to hit
> > >>> the GET v1/namespaces/{ns}/tables/{table} route, instead of GET
> > >>> v2/namespaces/{ns}/tables/{table}. The v1 route could continue to
> support
> > >>> the client, or it could throw 406 to indicate that this route is
> deprecated
> > >>> and the client needs to upgrade.
> > >>>
> > >>> For initial backwards compatibility, I think not returning anything
> > >>> should mean that all API that the client understands are having
> version *.
> > >>>
> > >>> What do people think of it, compared to the tag approach?
> > >>>
> > >>> Best,
> > >>> Jack Ye
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Jun 24, 2024 at 1:42 PM Micah Kornfield <
> emkornfi...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> I don't have strong opinions either way here, just thought it was
> worth
> > >>>> raising some concerns over possible evolution here.  Some responses
> inline,
> > >>>> but if capabilities seem to meet the requirement at hand, then it
> does
> > >>>> potentially seem the simplest mechanism.
> > >>>>
> > >>>>
> > >>>> I think we also want to avoid relyance on server specific published
> > >>>>> OpenAPI as they may leak other options/parameters/etc.  This may
> lead to
> > >>>>> confusion around what the canonical spec is and make clients
> incompatible
> > >>>>> if they're generated off of a non-standard spec document.
> > >>>>
> > >>>>
> > >>>> Yeah, I wasn't proposing necessarily using built in functionality
> but a
> > >>>> pre-scrubbed document.  Since there is no reference service
> implementation
> > >>>> for REST it seems like each implementor would need to describe the
> best way
> > >>>> of scrubbing there description.
> > >>>>
> > >>>>
> > >>>>
> > >>>>> @Micah this sounds to me as if the client would then have to parse
> a
> > >>>>> bunch of endpoints to figure out whether it's safe to e.g. call
> loading a
> > >>>>> view or dropping a table on the given REST server. Rather than
> having a
> > >>>>> dedicated endpoint we're just using the */config* endpoint to
> provide
> > >>>>> information about what a server supports.
> > >>>>
> > >>>>
> > >>>> I was not suggesting multiple endpoints here, simply different
> > >>>> contents  for */config *I agree in the short term this does add
> > >>>> complexity on the clients. But given that the canonical REST API
> clients
> > >>>> are being developed into the standard library, I'm not sure how
> much toil
> > >>>> this would cause in general. This also does not necessarily need to
> called
> > >>>> up-front but could be called to verify existence vs a permission
> issue
> > >>>> after an error was received.
> > >>>>
> > >>>> What round-trips did you have in mind here?
> > >>>>
> > >>>>
> > >>>> All good points though, but I'm not aware of a standard way to
> handle
> > >>>>> this.
> > >>>>
> > >>>>
> > >>>> IIUC, this sounds like a standard service description problem to me,
> > >>>> the solution with capabilities appears to be one level abstraction
> on top
> > >>>> of this.  Service discovery seems like it has been reimplemented a
> few
> > >>>> different times depending on the technology [1][2][3]
> > >>>>
> > >>>>
> > >>>> I think versioning adds another level of complexity, but might be
> > >>>>> necessary since I expect these will evolve to some extent and may
> even
> > >>>>> require hitting versioned urls.
> > >>>>
> > >>>>
> > >>>> If there is no concrete proposal on versioning, I agree it probably
> > >>>> pays to side step this.  The endpoint transitioning from list of
> strings to
> > >>>> list of objects, would be an obvious sign to clients that they are
> out of
> > >>>> date.  I think serving a service description(s), despite its
> complexity, is
> > >>>> likely the most principled way of versioning items appropriately,
> but this
> > >>>> definitely requires more in depth thought/design.
> > >>>>
> > >>>>
> > >>>> Thanks,
> > >>>> Micah
> > >>>>
> > >>>> [1] https://en.wikipedia.org/wiki/Web_Services_Description_Language
> > >>>> [2]
> https://en.wikipedia.org/wiki/Web_Application_Description_Language
> > >>>> [3] https://developers.google.com/discovery/v1/reference/apis
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Mon, Jun 24, 2024 at 12:42 PM Daniel Weeks <dwe...@apache.org>
> > >>>> wrote:
> > >>>>
> > >>>>> Hey Micah,
> > >>>>>
> > >>>>> I think what we're trying to achieve is strike a balance between
> > >>>>> client complexity and ability to support multiple server-side
> > >>>>> capabilities.  One challenge we've run into is if a client
> performs an
> > >>>>> operation (e.g. listViews), but receives a 403 code, it's not
> clear whether
> > >>>>> the client doesn't have access or the server doesn't support an
> endpoint
> > >>>>> but isn't sending a 404 for security reasons.  This is a simple
> way for the
> > >>>>> client to understand what it should expect from the server.
> > >>>>>
> > >>>>> >  Another option would be just list all endpoints . . . and let
> > >>>>> clients take appropriate actions
> > >>>>> > This could be done by vending the OpenAPI spec the server
> supports
> > >>>>> at its own endpoint. I think this avoids the future problem of
> having to
> > >>>>> classify new endpoints into a specific capability.
> > >>>>>
> > >>>>> You're right that this would be the most complete way to handle
> this,
> > >>>>> but it's really complicated and may require additional "handshake"
> calls
> > >>>>> even for small interactions with the catalog service.  I think
> this puts a
> > >>>>> lot of onus on the client, when what we're describing is a set of
> endpoints
> > >>>>> that correspond to a capability.
> > >>>>>
> > >>>>> I think we also want to avoid relyance on server specific published
> > >>>>> OpenAPI as they may leak other options/parameters/etc.  This may
> lead to
> > >>>>> confusion around what the canonical spec is and make clients
> incompatible
> > >>>>> if they're generated off of a non-standard spec document.
> > >>>>>
> > >>>>> All good points though, but I'm not aware of a standard way to
> handle
> > >>>>> this.
> > >>>>>
> > >>>>> I think versioning adds another level of complexity, but might be
> > >>>>> necessary since I expect these will evolve to some extent and may
> even
> > >>>>> require hitting versioned urls.
> > >>>>>
> > >>>>> -Dan
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Mon, Jun 24, 2024 at 12:03 AM Eduard Tudenhöfner <
> > >>>>> etudenhoef...@apache.org> wrote:
> > >>>>>
> > >>>>>> We had a separate discussion with Dan on the *oauth2* flag last
> week
> > >>>>>> and came to the same conclusion that removing the *oauth2*
> > >>>>>> capability is probably the best for now.
> > >>>>>> This is mainly because we can't really act on the *oauth2*
> > >>>>>> capability right now, because the */tokens* endpoint is called
> > >>>>>> before we hit the */config* endpoint.
> > >>>>>>
> > >>>>>> > Another option would be just list all endpoints (and maybe even
> > >>>>>> further which operations are supported) the server actually
> supports and
> > >>>>>> let clients take appropriate actions (i.e. grouping could happen
> on the
> > >>>>>> client side).  This could be done by vending the OpenAPI spec the
> server
> > >>>>>> supports at its own endpoint. I think this avoids the future
> problem of
> > >>>>>> having to classify new endpoints into a specific capability.
> > >>>>>>
> > >>>>>> @Micah this sounds to me as if the client would then have to
> parse a
> > >>>>>> bunch of endpoints to figure out whether it's safe to e.g. call
> loading a
> > >>>>>> view or dropping a table on the given REST server. Rather than
> having a
> > >>>>>> dedicated endpoint we're just using the */config* endpoint to
> > >>>>>> provide information about what a server supports.
> > >>>>>>
> > >>>>>> Thanks
> > >>>>>> Eduard
> > >>>>>>
> > >>>>>> On Fri, Jun 21, 2024 at 8:27 PM Ryan Blue
> > >>>>>> <b...@databricks.com.invalid> <b...@databricks.com.invalid>
> wrote:
> > >>>>>>
> > >>>>>>> Let's remove the oauth2 tag for now until we figure out how to
> move
> > >>>>>>> forward there. That makes sense to me.
> > >>>>>>>
> > >>>>>>> On Fri, Jun 21, 2024 at 9:30 AM Dmitri Bourlatchkov
> > >>>>>>> <dmitri.bourlatch...@dremio.com.invalid>
> > >>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
> > >>>>>>>
> > >>>>>>>> Hi Eduard,
> > >>>>>>>>
> > >>>>>>>> The capabilities PR looks good to me overall. I have a concern
> with
> > >>>>>>>> the "oauth2" tag name though.
> > >>>>>>>>
> > >>>>>>>> I also commented [1] in GH but the comment appears to be closed
> by
> > >>>>>>>> default :)
> > >>>>>>>>
> > >>>>>>>> I believe the term "oauth2" is confusing in this context with
> > >>>>>>>> respect to RFC 6749 [2] as discussed in depth on another thread
> [3]
> > >>>>>>>>
> > >>>>>>>> The functionality behind the /tokens endpoint is quite specific
> to
> > >>>>>>>> the Iceberg REST spec and as the other discussion highlights,
> there are
> > >>>>>>>> concerns with respect to OAuth2 interoperability with other
> OAuth2 servers.
> > >>>>>>>>
> > >>>>>>>> What do you think about using a different tag name for it, for
> > >>>>>>>> example "local-tokens" or "auth-tokens"?
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Dmitri.
> > >>>>>>>>
> > >>>>>>>> [1]
> > >>>>>>>>
> https://github.com/apache/iceberg/pull/9940/files/15c769a52b85ac4deff5659978c7ffa7802612b0#r1649173934
> > >>>>>>>> [2] https://www.rfc-editor.org/rfc/rfc6749
> > >>>>>>>> [3]
> > >>>>>>>>
> https://lists.apache.org/thread/twk84xx7v0xy5q5tfd9x5torgr82vv50
> > >>>>>>>>
> > >>>>>>>> On Thu, Jun 20, 2024 at 7:28 AM Eduard Tudenhoefner <
> > >>>>>>>> etudenhoef...@apache.org> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hey everyone,
> > >>>>>>>>>
> > >>>>>>>>> I'd like to bring up the discussion around describing REST
> server
> > >>>>>>>>> capabilities via the */config* endpoint.
> > >>>>>>>>> There is PR #9940 <https://github.com/apache/iceberg/pull/9940>
> that
> > >>>>>>>>> describes the OpenAPI spec changes.
> > >>>>>>>>>
> > >>>>>>>>> Mainly we'd like to have a *capabilities* field in the
> > >>>>>>>>> *ConfigResponse* that allows servers to indicate to clients
> which
> > >>>>>>>>> capabilities are being supported.
> > >>>>>>>>>
> > >>>>>>>>> So far we have the following capabilities:
> > >>>>>>>>>
> > >>>>>>>>>    - tables
> > >>>>>>>>>    - views
> > >>>>>>>>>    - remote-signing
> > >>>>>>>>>    - vended-credentials
> > >>>>>>>>>    - multi-table-commit
> > >>>>>>>>>    - register-table
> > >>>>>>>>>    - table-metrics
> > >>>>>>>>>    - oauth2
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> The general idea behind a capability is that if e.g. a server
> > >>>>>>>>> supports *views*, then that server must implement all endpoints
> > >>>>>>>>> grouped under that capability.
> > >>>>>>>>> It's worth noting that the */config* endpoint is currently
> being
> > >>>>>>>>> implicit (meaning that every REST server would have to
> implement it).
> > >>>>>>>>>
> > >>>>>>>>> One discussion point that came up during review is how we want
> to
> > >>>>>>>>> handle capabilities and backwards compatibility and what the
> default
> > >>>>>>>>> capability would be, since older servers don't know anything
> about
> > >>>>>>>>> *capabilities* (in such a case we could assume that the default
> > >>>>>>>>> capabilities would be *oauth2* / *tables*).
> > >>>>>>>>>
> > >>>>>>>>> Are there any other capabilities that we'd like to include in
> the
> > >>>>>>>>> list?
> > >>>>>>>>>
> > >>>>>>>>> Eduard
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Ryan Blue
> > >>>>>>> Databricks
> > >>>>>>>
> > >>>>>> --
> > >> Robert Stupp
> > >> @snazy
> > >>
> > >>
> >
>

Reply via email to