I do want to remind us that the original issue
<https://github.com/apache/iceberg/issues/10338> was reported by users from
the community before we were internally aware of this issue, meaning that
there are users of the V1 API that are either running into this issue or
will eventually run into this issue when they upgrade their server stack.

For the users of the V1 API we have a simple path forward (make it
*configurable* what's currently *hardcoded*) so I do not see a reason why
we shouldn't proceed and fix this for these users. I do not think that we
should wait to fix this problem until we have V2 APIs.

The configuration part is *entirely optional* for REST server implementers
and there's no behavioral change for existing installations.

Also it seems we have general consensus on the approach from at least 6
people (me included).

In the meantime the Iceberg community can focus on how a V2 API would look
and what potential escaping mechanisms would be necessary to solve the
general problem.

Thanks everyone and I'll go ahead and start a VOTE thread to update the
wording in the REST spec in #10877
<https://github.com/apache/iceberg/pull/10877>.

Eduard


On Wed, Aug 14, 2024 at 6:46 PM Robert Stupp <sn...@snazy.de> wrote:

> I do object the plan to make that separation character configurable,
> because there is no need to do this (see below), so there's no need for any
> code change. Instead having a proper, mostly human readable escaping
> mechanism, which is possible, should be implemented with Iceberg REST v2.
>
> Here's the relevant part of the Servlet Spec v6 - first paragraph of [1]:
> "Servlet containers may implement the standard Servlet URI path
> canonicalization in any manner they see fit as long as the end result is
> identical to the end result of the process described here. Servlet
> containers may provide container specific configuration options to vary the
> standard canonicalization process. Any such variations may have security
> implications and both Servlet container implementors and users are advised
> to be sure that they understand the implications of any such container
> specific canonicalization options.
>
> There's no mention of "must" - the servlet spec carefully uses "may". Just
> 2-3 letters, but a huge difference, which does *not* mean that e.g. '%1F'
> has to be rejected, hence Jetty allows users to disable this behavior.
>
> I think, I mentioned my concerns already, but here's a summary:
>
> * There is no spec that forces this change.
>
> * The proposed change only tackles ASCII unit separator. The proposed
> change does NOT tackle other characters like '%' or '/' or '\'.
>
> * The proposed change adds yet another configuration option - there are
> already quite a lot. Getting that config wrong (and humans make mistakes,
> because they are not always aware or the consequences) leads to all kinds
> of issues.
>
> * Technical: The proposed change has no validation that the "separator" is
> a single character.
>
> * Technical: Allowing clients to configure such separator character opens
> the door for inconsistency due to protocol level discrepancies.
>
> * Using servlet spec 6 with the ambiguous/suspiciuous validation breaks
> all currently released clients.
>
>
> There are catalogs that do allow this via Spark SQL:
>   CREATE NAMESPACE my_catalog.`my/ugly\name%space.!`
>
>
> On 14.08.24 12:40, Eduard Tudenhöfner wrote:
>
>
> Keeping the Iceberg REST spec as is and configuring the (affected)
>> services that they accept the '%1F' separator character seems much easier
>> and less controversial and eliminates the need for these changes entirely.
>
>
> I do not think that this is a viable option for everyone. Such control
> characters are being rejected by the new Servlet spec for security reasons
> so not every REST server would want to open security holes by enabling
> something that the Servlet 6 spec is specifically trying to avoid.
> Here's some additional context on the topic of the *Servlet 6 rules on
> Ambiguous URIs *from the Jetty project
> <https://github.com/jetty/jetty.project/issues/11448#issuecomment-1969206031>
> :
>
> Servlet 6 (Jetty 12 / ee10) clarified a ton of vague language and
>> behaviors around a bunch of core concepts.
>> Back in Servlet 5 (Jetty 11 or Jetty 12 / ee9) these were fundamentally
>> broken and lead to a long array of security issues and broken libraries
>> that worked with Servlet 5.
>
>
>
> [1]
> https://jakarta.ee/specifications/servlet/6.0/jakarta-servlet-spec-6.0.html#uri-path-canonicalization
>

Reply via email to