Re: [DISCUSS][Proposal] Case-Insensitive Mode for Polaris Catalogs

Alexandre Dutra Mon, 20 Oct 2025 09:32:56 -0700

Hi Jonas,

Thanks for the proposal. I left a comment in the doc but since Dmitri
also brought up the issue with i18n support, let me expand on my
comment here:


Indeed case transformation is a complex operation in some languages,
so we should always use the appropriate locale.

But this information won't be available at the moment when the
conversion is done, so the safest choice is to go with Locale.ROOT.

And indeed, as Dmitri pointed out, such a locale is known to create
problems in many languages. When going from upper to lower case, we'd
see issues e.g. in Turkish (dotted i), German (SS -> ß) and Greek (Σ
-> σ/ς).

That said, the conversion wouldn't throw any errors – in Java,
String.toUpperCase(Locale) never throws. It would just yield an
awkward result.

If we are OK with this limitation, then I don't see any major blockers
in the proposal wrt to i18n handling.

Thanks,
Alex

On Mon, Oct 20, 2025 at 5:00 PM Dmitri Bourlatchkov <[email protected]> wrote:
>
> Hi Jonas,
>
> Thanks for the proposal! I added some comments in the docs, but I'd like to
> emphasize my biggest concern here as well.
>
> When we talk about upper/lower-casing we have to know the locale, in which
> that operation is to be performed.
>
> Using a specific locale, we have to declare a particular natural language
> context. Now, the question is how do we deal with identifiers that can
> Unicode characters from different languages?
>
> Tip of the "iceberg" :) : https://github.com/apache/iceberg/issues/9276
>
> Thanks,
> Dmitri.
>
> On Fri, Oct 17, 2025 at 7:26 PM Honah J. <[email protected]> wrote:
>
> > Hi everyone,
> >
> > I would like to start a discussion around supporting an option to make
> > catalog case insensitive.
> >
> > In multi-engine data lake environments, different engines (Spark, Trino,
> > Flink, etc.) apply different casing and normalization rules when reading or
> > writing identifiers. As a result, the same logical table may be interpreted
> > differently across engines. For example, Polaris currently preserves
> > identifier casing, so a table created by Spark with mixed-case names may
> > not be discoverable from Trino, which lowercases identifiers. This
> > inconsistency burdens users and undermines script portability.
> >
> > I drafted a proposal[1] with more details and a solution: introducing an
> > immutable catalog property to store and look up namespaces, tables, and
> > other objects case‑insensitively
> >
> > I’d love to hear your feedback and suggestions!
> >
> > [1]
> >
> > https://docs.google.com/document/d/1-3ywobpRvgdHPhe0J4w7l6t4NX79iqaeFOohCXG_12U/edit?usp=sharing
> > <
> > https://docs.google.com/document/d/1-3ywobpRvgdHPhe0J4w7l6t4NX79iqaeFOohCXG_12U/edit?usp=sharing
> > >
> >
> > Best regards,
> > Jonas
> >

Re: [DISCUSS][Proposal] Case-Insensitive Mode for Polaris Catalogs

Reply via email to