Thanks, Eduard, for 10877 [1] and addressing my concerns quickly :)

This approach is fine from my POV, although I personally prefer the
flexibility of what Jack proposed.

Cheers,
Dmitri.

[1] https://github.com/apache/iceberg/pull/10877

On Thu, Aug 8, 2024 at 6:28 AM Eduard Tudenhöfner <etudenhoef...@apache.org>
wrote:

> I've opened #10877 <https://github.com/apache/iceberg/pull/10877> a few
> days ago to make the namespace separator configurable and let servers
> communicate to clients which separator should be used. Worth mentioning
> that this doesn't require any spec chance and it is backwards compatible
> with older clients that send *%1F*.
>
> I'm ok if we want to make the namespace separator also be controllable
> from the client and send a *delim* query param, but I agree with Dmitri
> that the server should still be able to tell a client which separator it
> prefers (as indicated in #10877
> <https://github.com/apache/iceberg/pull/10877>).
>
> Sending the *delim* query param requires a non-breaking spec change, so
> I've opened #10904 <https://github.com/apache/iceberg/pull/10904>.
>
> After we vote on the spec change I'll add the required impl
> <https://github.com/apache/iceberg/pull/10905> (which just goes on top of
> #10877 <https://github.com/apache/iceberg/pull/10877>) to send the *delim* 
> query
> param.
>
> Eduard
>
> On Thu, Aug 8, 2024 at 12:52 AM Dmitri Bourlatchkov
> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>
>> The idea of client-chosen separator char (?delim=.) sounds pretty
>> reasonable to me.
>>
>> Nonetheless, I do not think this covers all the issues in putting
>> namespaces in URI paths for servers running under the new Servlet spec. In
>> particular, there are other chars that are considered "suspicious" by the
>> servlet spec and may not be allowed.
>>
>> Since Iceberg defines a REST API spec, I think it is reasonable for REST
>> server implementers to expect Iceberg to provide means in the REST spec to
>> ensure compatibility with modern servlet frameworks. This is critical so
>> that REST clients and servers have the same approach to URI
>> construction/interpretation.
>>
>> That said, if the general consensus is to keep this discussion focused
>> only on the namespace element separators, I'd like to add a couple of
>> points:
>>
>> It would be nice to have a default value for this parameter matching
>> current separator to ensure compatibility with older clients.
>>
>> I think we should also allow a client-side configuration setting for
>> end-users to control this. I think it would be ok to allow servers to
>> advertise a default value for the separator. However servers should accept
>> different values according to the `delim` parameter. End-users should be
>> able to override server-side config with their own client-side config since
>> end users ultimately know more about their namespace names.
>>
>> Cheers,
>> Dmitri.
>>
>> On Wed, Aug 7, 2024 at 6:07 PM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> Sorry a bit late to this thread.
>>>
>>> I would personally prefer the client side separator solution (query
>>> param with `?delim=.`) a bit more than the server side (config
>>> override), just given the experience of handling similar situations for
>>> Glue data catalog which allows any name for database (namespace) and table,
>>> except for white space characters [1].
>>>
>>> In the Glue-HiveMetaStore connector [2], there is a feature to use a
>>> single string to reference a namespace in a non-default catalog, e.g.
>>> "cat1:ns1" can mean catalog cat1 namespace ns1. This is basically a 2-level
>>> namespace. After a similar discussion thread and exploring what users are
>>> actually using, we realized that no separator could work for everyone, so
>>> we introduced a config value [3] that users can set at client side when
>>> using the connector. If there is a namespace with name "ns1:ns2" in a
>>> catalog "cat1", then the user can choose a different separator like "$' to
>>> write "cat1$ns1:ns2", which can allow us to correctly resolve all the name
>>> references.
>>>
>>> We found this the most flexible, since an organization would typically
>>> stick to just one or two special characters, and can always find something
>>> else as a separator. Even in extreme cases, a user can choose a long string
>>> like "__SEPARATOR__" as the separator.
>>>
>>> -Jack
>>>
>>> [1] https://docs.aws.amazon.com/glue/latest/webapi/API_GlueTable.html
>>> [2]
>>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore
>>> [3]
>>> https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/branch-3.4.0/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/util/AWSGlueConfig.java#L26
>>>
>>>
>>> On Tue, Aug 6, 2024 at 1:32 PM Ryan Blue <b...@databricks.com.invalid>
>>> wrote:
>>>
>>>>
>>>>    1. Change %1F to %2E:
>>>>
>>>> As I noted in my earlier email, a catalog may choose to use . as a
>>>> one-way conversion so that it doesn’t matter that you can’t split the
>>>> namespace. This does work, but with slightly different behavior.
>>>>
>>>> The original decision on this issue was that the behavior should be a
>>>> catalog choice, which is why we went with what we thought was a safe
>>>> delimiter. For this discussion, the model I originally proposed could be
>>>> supported by setting the delimiter to ..
>>>>
>>>> New option returned from the config-endpoint telling the which
>>>> namespace-element separator to use
>>>>
>>>> As Eduard noted, old clients affected by this problem are already
>>>> broken so it is okay if the fix is only for newer clients. If a client
>>>> isn’t broken, then the service still needs to support 0x1F for
>>>> compatibility. Given that 0x1F is not a common character in names, I
>>>> think that should be safe.
>>>>
>>>> I do not see any other option than changing the REST spec and define an
>>>> escaping mechanism, which requires new endpoints.
>>>>
>>>> I disagree here. I think making the delimiter configurable is a good
>>>> solution that doesn’t require new endpoints. I definitely prefer this to a
>>>> more complicated scheme.
>>>>
>>>> Ryan
>>>>
>>>> On Tue, Aug 6, 2024 at 2:06 AM Robert Stupp <sn...@snazy.de> wrote:
>>>>
>>>>> I'd like to summarize the proposals that came up:
>>>>>
>>>>> 1. Change `%1F` to `%2E`:
>>>>>
>>>>> Already existing namespaces that have the dot-character (`%2E` == `.`)
>>>>> become inaccessible. A namespace ["my", "elem.foo"] is then encoded as
>>>>> `my.elem.foo` and decoded as ["my", "elem", "foo"], which is incorrect.
>>>>>
>>>>> 2. New option returned from the config-endpoint telling the which
>>>>> namespace-element separator to use:
>>>>>
>>>>> (Old) clients won't respect this option - and in turn, the service has
>>>>> to expect both `%1F` and the "proposed" separator character. If an old
>>>>> client sends a namespace element that contains the separator proposed by
>>>>> the service, the namespace representation on the service side is not the
>>>>> same as the one on the client side. Ex: A new service advertises `%2E" via
>>>>> the config endpoint - the old client doesn't know about it and encodes
>>>>> ["my", "elem", "foo"] as `my%1Felem%1Ffoo`, which the service either
>>>>> rejects with a HTTP/4xx or interprets as ["my%1Felem%1Ffoo"] - both are
>>>>> incorrect.
>>>>>
>>>>> 3. Clients send a new query parameter `?delim=.`
>>>>>
>>>>> (Old) services that don't know about this new query parameter will
>>>>> interpret the namespace differently, the namespace representation on the
>>>>> service side is not the same as the one on the client side. A client
>>>>> encodes a namespace ["my", "elem", "foo"] as `my.elem.foo` and adds the
>>>>> `?delim=.` query param. Old services interpret this as ["my.elem.foo"],
>>>>> which is incorrect.
>>>>>
>>>>>
>>>>> In any case, services _have to_ support `%1F` or compatibility w/
>>>>> older clients will be broken.
>>>>>
>>>>> I do not see any other option than changing the REST spec and define
>>>>> an escaping mechanism, which requires new endpoints.
>>>>>
>>>>> All options proposed so far are potentially breaking REST spec changes.
>>>>>
>>>>>
>>>>> On 05.08.24 19:06, Daniel Weeks wrote:
>>>>>
>>>>> I would agree with adding either a server side (config override) or
>>>>> client side control (query param with `?delim=.`) as it will be
>>>>> compatible with the current v1 endpoint.
>>>>>
>>>>> In the future we could introduce a v2 endpoint(s), but I would want to
>>>>> wait for OpenAPI 4 because they address this by allowing multi-segment
>>>>> pathing via URI templates in RFC 6570
>>>>> <https://datatracker.ietf.org/doc/html/rfc6570>, which is the
>>>>> original way we wanted to represent namespaces, but it wasn't supported
>>>>> (e.g. .../{+namespaces}/tables/{table}).  I doubt it's really worth
>>>>> the effort though, so I feel like a configurable delimiter makes the most
>>>>> sense.
>>>>>
>>>>> -Dan
>>>>>
>>>>> --
>>>>> Robert Stupp
>>>>> @snazy
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Databricks
>>>>
>>>

Reply via email to