Thanks, Eduard, for 10877 [1] and addressing my concerns quickly :) This approach is fine from my POV, although I personally prefer the flexibility of what Jack proposed.
Cheers, Dmitri. [1] https://github.com/apache/iceberg/pull/10877 On Thu, Aug 8, 2024 at 6:28 AM Eduard Tudenhöfner <etudenhoef...@apache.org> wrote: > I've opened #10877 <https://github.com/apache/iceberg/pull/10877> a few > days ago to make the namespace separator configurable and let servers > communicate to clients which separator should be used. Worth mentioning > that this doesn't require any spec chance and it is backwards compatible > with older clients that send *%1F*. > > I'm ok if we want to make the namespace separator also be controllable > from the client and send a *delim* query param, but I agree with Dmitri > that the server should still be able to tell a client which separator it > prefers (as indicated in #10877 > <https://github.com/apache/iceberg/pull/10877>). > > Sending the *delim* query param requires a non-breaking spec change, so > I've opened #10904 <https://github.com/apache/iceberg/pull/10904>. > > After we vote on the spec change I'll add the required impl > <https://github.com/apache/iceberg/pull/10905> (which just goes on top of > #10877 <https://github.com/apache/iceberg/pull/10877>) to send the *delim* > query > param. > > Eduard > > On Thu, Aug 8, 2024 at 12:52 AM Dmitri Bourlatchkov > <dmitri.bourlatch...@dremio.com.invalid> wrote: > >> The idea of client-chosen separator char (?delim=.) sounds pretty >> reasonable to me. >> >> Nonetheless, I do not think this covers all the issues in putting >> namespaces in URI paths for servers running under the new Servlet spec. In >> particular, there are other chars that are considered "suspicious" by the >> servlet spec and may not be allowed. >> >> Since Iceberg defines a REST API spec, I think it is reasonable for REST >> server implementers to expect Iceberg to provide means in the REST spec to >> ensure compatibility with modern servlet frameworks. This is critical so >> that REST clients and servers have the same approach to URI >> construction/interpretation. >> >> That said, if the general consensus is to keep this discussion focused >> only on the namespace element separators, I'd like to add a couple of >> points: >> >> It would be nice to have a default value for this parameter matching >> current separator to ensure compatibility with older clients. >> >> I think we should also allow a client-side configuration setting for >> end-users to control this. I think it would be ok to allow servers to >> advertise a default value for the separator. However servers should accept >> different values according to the `delim` parameter. End-users should be >> able to override server-side config with their own client-side config since >> end users ultimately know more about their namespace names. >> >> Cheers, >> Dmitri. >> >> On Wed, Aug 7, 2024 at 6:07 PM Jack Ye <yezhao...@gmail.com> wrote: >> >>> Sorry a bit late to this thread. >>> >>> I would personally prefer the client side separator solution (query >>> param with `?delim=.`) a bit more than the server side (config >>> override), just given the experience of handling similar situations for >>> Glue data catalog which allows any name for database (namespace) and table, >>> except for white space characters [1]. >>> >>> In the Glue-HiveMetaStore connector [2], there is a feature to use a >>> single string to reference a namespace in a non-default catalog, e.g. >>> "cat1:ns1" can mean catalog cat1 namespace ns1. This is basically a 2-level >>> namespace. After a similar discussion thread and exploring what users are >>> actually using, we realized that no separator could work for everyone, so >>> we introduced a config value [3] that users can set at client side when >>> using the connector. If there is a namespace with name "ns1:ns2" in a >>> catalog "cat1", then the user can choose a different separator like "$' to >>> write "cat1$ns1:ns2", which can allow us to correctly resolve all the name >>> references. >>> >>> We found this the most flexible, since an organization would typically >>> stick to just one or two special characters, and can always find something >>> else as a separator. Even in extreme cases, a user can choose a long string >>> like "__SEPARATOR__" as the separator. >>> >>> -Jack >>> >>> [1] https://docs.aws.amazon.com/glue/latest/webapi/API_GlueTable.html >>> [2] >>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore >>> [3] >>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/branch-3.4.0/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/util/AWSGlueConfig.java#L26 >>> >>> >>> On Tue, Aug 6, 2024 at 1:32 PM Ryan Blue <b...@databricks.com.invalid> >>> wrote: >>> >>>> >>>> 1. Change %1F to %2E: >>>> >>>> As I noted in my earlier email, a catalog may choose to use . as a >>>> one-way conversion so that it doesn’t matter that you can’t split the >>>> namespace. This does work, but with slightly different behavior. >>>> >>>> The original decision on this issue was that the behavior should be a >>>> catalog choice, which is why we went with what we thought was a safe >>>> delimiter. For this discussion, the model I originally proposed could be >>>> supported by setting the delimiter to .. >>>> >>>> New option returned from the config-endpoint telling the which >>>> namespace-element separator to use >>>> >>>> As Eduard noted, old clients affected by this problem are already >>>> broken so it is okay if the fix is only for newer clients. If a client >>>> isn’t broken, then the service still needs to support 0x1F for >>>> compatibility. Given that 0x1F is not a common character in names, I >>>> think that should be safe. >>>> >>>> I do not see any other option than changing the REST spec and define an >>>> escaping mechanism, which requires new endpoints. >>>> >>>> I disagree here. I think making the delimiter configurable is a good >>>> solution that doesn’t require new endpoints. I definitely prefer this to a >>>> more complicated scheme. >>>> >>>> Ryan >>>> >>>> On Tue, Aug 6, 2024 at 2:06 AM Robert Stupp <sn...@snazy.de> wrote: >>>> >>>>> I'd like to summarize the proposals that came up: >>>>> >>>>> 1. Change `%1F` to `%2E`: >>>>> >>>>> Already existing namespaces that have the dot-character (`%2E` == `.`) >>>>> become inaccessible. A namespace ["my", "elem.foo"] is then encoded as >>>>> `my.elem.foo` and decoded as ["my", "elem", "foo"], which is incorrect. >>>>> >>>>> 2. New option returned from the config-endpoint telling the which >>>>> namespace-element separator to use: >>>>> >>>>> (Old) clients won't respect this option - and in turn, the service has >>>>> to expect both `%1F` and the "proposed" separator character. If an old >>>>> client sends a namespace element that contains the separator proposed by >>>>> the service, the namespace representation on the service side is not the >>>>> same as the one on the client side. Ex: A new service advertises `%2E" via >>>>> the config endpoint - the old client doesn't know about it and encodes >>>>> ["my", "elem", "foo"] as `my%1Felem%1Ffoo`, which the service either >>>>> rejects with a HTTP/4xx or interprets as ["my%1Felem%1Ffoo"] - both are >>>>> incorrect. >>>>> >>>>> 3. Clients send a new query parameter `?delim=.` >>>>> >>>>> (Old) services that don't know about this new query parameter will >>>>> interpret the namespace differently, the namespace representation on the >>>>> service side is not the same as the one on the client side. A client >>>>> encodes a namespace ["my", "elem", "foo"] as `my.elem.foo` and adds the >>>>> `?delim=.` query param. Old services interpret this as ["my.elem.foo"], >>>>> which is incorrect. >>>>> >>>>> >>>>> In any case, services _have to_ support `%1F` or compatibility w/ >>>>> older clients will be broken. >>>>> >>>>> I do not see any other option than changing the REST spec and define >>>>> an escaping mechanism, which requires new endpoints. >>>>> >>>>> All options proposed so far are potentially breaking REST spec changes. >>>>> >>>>> >>>>> On 05.08.24 19:06, Daniel Weeks wrote: >>>>> >>>>> I would agree with adding either a server side (config override) or >>>>> client side control (query param with `?delim=.`) as it will be >>>>> compatible with the current v1 endpoint. >>>>> >>>>> In the future we could introduce a v2 endpoint(s), but I would want to >>>>> wait for OpenAPI 4 because they address this by allowing multi-segment >>>>> pathing via URI templates in RFC 6570 >>>>> <https://datatracker.ietf.org/doc/html/rfc6570>, which is the >>>>> original way we wanted to represent namespaces, but it wasn't supported >>>>> (e.g. .../{+namespaces}/tables/{table}). I doubt it's really worth >>>>> the effort though, so I feel like a configurable delimiter makes the most >>>>> sense. >>>>> >>>>> -Dan >>>>> >>>>> -- >>>>> Robert Stupp >>>>> @snazy >>>>> >>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Databricks >>>> >>>