It’s a valid point that Polaris needs to support multi-tenancy, or even
across different external catalogs (such as remote HMS) within a single
realm.

Unfortunately, Kerberos isn’t compatible with this model, as it requires
global configuration per JVM, making it inherently single-tenant. So I’d
suggest we rule out the Kerberos option and explore more flexible
authentication schemes.

Here’s a quick summary of viable alternatives:

   - Implicit Authentication: This can not support multi-tenancy when
   environment variables are leveraged per instance. We could further extend
   this in the future by integrating a more sophisticated secrets manager to
   improve tenant isolation and credential handling.


   - LDAP: A well-established solution that naturally supports multiple
   credentials and user contexts. It aligns well with multi-tenant needs.

Given that, I think we have a clear path forward for enabling multi-tenancy
with HMS federation. Introducing implicit authentication as a starting
point seems reasonable. It can be disabled by default, with Polaris admins
choosing to enable it based on their environment. Even in single-HMS
deployments, this option brings real value without adding complexity, esp.
A lot of organizations only have one HMS instance.

Yufei


On Wed, Jul 9, 2025 at 7:44 AM Robert Stupp <sn...@snazy.de> wrote:

> Following up on my email:
>
> Polaris would really benefit from supporting HMS and other catalog
> types. And the way I see to get there is to have a "HMS only" IRC
> service, which can be legibly built on Java 11, use Kerberos, etc.
> Polaris can then federate to that HMS catalog.
> AFAIU clients can authenticate to k8s and get OAuth tokens. Those can
> be used to talk to Polaris, which can in turn talk to the HMS service.
>
> What I do object though is making Polaris effectively a single realm +
> single catalog service and add new dependencies to Hadoop + Hive +
> Kerberos to Polaris.
>
> On Wed, Jul 9, 2025 at 12:17 PM Robert Stupp <sn...@snazy.de> wrote:
> >
> > Let's recap what Polaris offers:
> > 1. Multi tenancy via realms
> > 2. Multiple catalogs per realm
> > 3. OAuth/OIDC
> >
> > Adding Kerberos is global per JVM, making #1 impossible and likely
> > also not suitable for #2, plus adding another complicated and complex
> > auth mechanism.
> > If Kerberos is a strong concern, I propose to contribute necessary
> > changes to the "Iceberg auth manager project" [1] to let clients use
> > krb and receive OAuth tokens for it.It is also worth mentioning that
> > testing all that (development and CI including unit and especially
> > integration tests) is a huge effort in itself.
> >
> > Again, federating to another "single tenant / single catalog HMS krb"
> > Iceberg REST service behind Polaris is fine. Krb clients can authorize
> > against Polaris via OAuth, and likely can Polaris itself authorize
> > itself using OAuth.
> >
> > I strongly object to depending even more on Hadoop for the reasons
> > outlined earlier. I also strongly object to adding Kerberos to
> > Polaris.
> >
> > BTW: Hadoop is not necessary for Iceberg to work, it is rather an "opt
> > in" (ex: org.apache.iceberg.hadoop.Configurable#setConf).
> >
> > [1] https://github.com/dremio/iceberg-auth-manager
> >
> > On Tue, Jul 8, 2025 at 6:25 PM Yufei Gu <flyrain...@gmail.com> wrote:
> > >
> > > HMS integration is a key step toward one of Polaris’s critical
> missions:
> > > helping users move off HMS. It brings clear value by aligning with our
> > > long-term direction.
> > >
> > > I’m not too concerned about hive.xml, most of its configurations can be
> > > dynamically injected at runtime. The real challenge lies in Kerberos
> > > integration. Since krb5.conf and the keytab are globally configured per
> > > JVM, a single JVM instance cannot support true multi-tenancy. As far
> as I
> > > know, there isn’t a clean solution to this limitation.
> > >
> > > If that's indeed the case, Option 2a becomes far less appealing to me.
> > >
> > > Yufei
> > >
> > >
> > > On Mon, Jul 7, 2025 at 11:18 AM Russell Spitzer <
> russell.spit...@gmail.com>
> > > wrote:
> > >
> > > > I think having some integration with HMS is definitely a good idea.
> We've
> > > > already seen
> > > > users build this in the wild on top of Polaris showing that there is
> > > > definitely a demand.
> > > >  I'm still a strong believer that we should be helping users get to
> Polaris
> > > > from whatever systems
> > > > they are currently using to Polaris.
> > > >
> > > > On Mon, Jul 7, 2025 at 12:59 PM Eric Maynard <
> eric.w.mayn...@gmail.com>
> > > > wrote:
> > > >
> > > > > 1. We (Polaris) can provide end users a way to migrate off of these
> > > > > catalogs that the Iceberg project no longer wants to invest into.
> > > > > Implementing HMS federation is in service to the goal of removing
> > > > > non-Iceberg catalogs, not in contradiction to it.
> > > > >
> > > > > 2. This does not seem like a user-centered concern, but I'm also
> not
> > > > sure I
> > > > > understand exactly what is being expressed here. Are you saying
> that the
> > > > > current HADOOP federation does not work somehow?
> > > > >
> > > > > 3. Yes, please see the other thread about the IMPLICIT
> authentication
> > > > type
> > > > > for discussion of this topic. Note, however, that HMS federation
> may
> > > > > support authentication types other than IMPLICIT.
> > > > >
> > > > > 4. That depends on what you mean by "depends on" -- it could also
> be said
> > > > > that Iceberg itself depends on Hadoop.
> > > > >
> > > > > 5. This not only also applies to HADOOP federation, which already
> exists,
> > > > > but also does *not* apply to HMS federation when using an
> authentication
> > > > > mechanism other than IMPLICIT -- again, please see the other
> thread for
> > > > > more discussion of this topic.
> > > > >
> > > > > On Fri, Jul 4, 2025 at 3:52 AM Robert Stupp <sn...@snazy.de>
> wrote:
> > > > >
> > > > > > I'd really prefer to not add "anything Hive" to Polaris itself,
> and I'd
> > > > > > really like to see Hadoop being removed entirely from the
> Polaris code
> > > > > > base.
> > > > > >
> > > > > > There are multiple reasons for this:
> > > > > >
> > > > > > 1. The Iceberg project would rather like to remove all catalogs
> except
> > > > > > the REST catalog. (That's at least what I understood from
> discussions
> > > > > > quite a while ago.)
> > > > > >
> > > > > > 2. Hadoop is quite behind supporting recent Java versions. It is
> > > > already
> > > > > > impossible to run "anything Hadoop" with Java 24. Considering
> how long
> > > > > > it took Hadoop to even support Java 11, it will take a long time
> until
> > > > > > Hadoop is ready for Java 24+, especially since Hadoop has to
> refactor a
> > > > > > lot of things. Polaris requires Java 21 and we know it works in
> CI with
> > > > > > Java 22+23 (both are EOL). Hadoop does only support Java 11, not
> 17,
> > > > not
> > > > > > 21.
> > > > > >
> > > > > > 3. Hadoop (HDFS) is as a very different security model, which is
> the
> > > > > > reason why HDFS is not suitable for Polaris production
> configuration,
> > > > > > guarded by explicit configuration options.
> > > > > >
> > > > > > 4. Hive depends on Hadoop, so all concerns about Hadoop also
> apply to
> > > > > Hive.
> > > > > >
> > > > > > 5. Polaris is multi-tenant (realms). A _single_ instance of Hive
> > > > > > contradicts this.
> > > > > >
> > > > > >
> > > > > > My vote would be on *not* adding Hive and also on removing Hadoop
> > > > > entirely.
> > > > > >
> > > > > > If someone comes up with an Iceberg REST catalog for Hive or
> HDFS and
> > > > > > Polaris can connect to it, that's fine for me, because it's
> outside of
> > > > > > Polaris. But I strongly object having Hadoop or even Hive in
> Polaris.
> > > > > >
> > > > > >
> > > > > > On 7/1/25 20:48, Pooja Nilangekar wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I wanted to start a discussion around the support for Hive
> Catalog
> > > > > > > federation in Polaris. In particular, there are two primary
> ways we
> > > > can
> > > > > > add
> > > > > > > support for Hive federation:
> > > > > > >
> > > > > > > *1. Support a single Hive instance per Polaris deployment* The
> Hive
> > > > > > > workflow would be identical to the Hadoop catalog workflow.
> Polaris
> > > > > > > would invoke the Iceberg connection library, that would try to
> find
> > > > the
> > > > > > > hive-site.xml file in (1) the CLASSPATH and (2) the default
> Hadoop
> > > > > > > locations: HADOOP_PATH and HADOOP_CONF_DIR. Polaris would then
> > > > > initialize
> > > > > > > the Hive connection using the configurations it found at these
> > > > > locations.
> > > > > > >
> > > > > > >     -
> > > > > > >
> > > > > > >     *Drawbacks: *The primary drawback of this approach is that
> if
> > > > > Polaris
> > > > > > >     finds multiple hive-site.xml files, it would merge their
> > > > > > configurations,
> > > > > > >     which could lead to potentially inconsistent connection
> state.
> > > > > > >     Furthermore, there is no clear documentation of the order
> in
> > > > which
> > > > > > the
> > > > > > >     configuration would be applied. While this is often
> predictable
> > > > on
> > > > > a
> > > > > > given
> > > > > > >     OS, it is not guaranteed across environments. The other key
> > > > > drawback
> > > > > > is
> > > > > > >     that if a Polaris user wants to federate to multiple Hive
> > > > catalogs,
> > > > > > their
> > > > > > >     only option is to deploy a separate Polaris instance for
> each
> > > > Hive
> > > > > > >     instance.
> > > > > > >
> > > > > > > *2. Support multiple Hive instances per Polaris deployment* The
> > > > > alternate
> > > > > > > (and in my view, ideal) solution is to allow Polaris to
> federate with
> > > > > > > multiple Hive catalogs. To support multiple catalogs, Polaris
> would
> > > > > > > explicitly disallow the connection library from reading
> hive-site.xml
> > > > > > files
> > > > > > > in the default paths. To pass in the configurations, Polaris
> can
> > > > adopt
> > > > > > one
> > > > > > > of two options:
> > > > > > >
> > > > > > >     -
> > > > > > >
> > > > > > >     *Option 2a: Accept a canonical path to the target
> hive-site.xml.*
> > > > > > >     -
> > > > > > >
> > > > > > >        *Advantages:* This guarantees that the connection
> > > > configurations
> > > > > > are
> > > > > > >        derived from a single source. It also allows Polaris to
> rely
> > > > on
> > > > > > the
> > > > > > >        NONE/ENVIRONMENT/PROVIDER/UNMANAGED mechanism, making it
> > > > > > especially
> > > > > > >        useful in case the Hive instance relies on Kerberos or
> custom
> > > > > > >        authentication that Polaris does not natively
> support/manage.
> > > > > > >        -
> > > > > > >
> > > > > > >        *Drawbacks:* The user needs to have access (or some
> mechanism
> > > > to
> > > > > > >        upload files) to the Polaris server's file system.
> > > > > > >        -
> > > > > > >
> > > > > > >     *Option 2b: Accept all the connection-specific parameters
> as a
> > > > part
> > > > > > of
> > > > > > >     the create-catalog request.*
> > > > > > >     -
> > > > > > >
> > > > > > >        *Advantage:* Polaris can directly accept and store the
> > > > > > configurations
> > > > > > >        in a DPO instead of relying on the user having access
> to the
> > > > > > > server's file
> > > > > > >        system (to create/update hive-site.xml).
> > > > > > >        -
> > > > > > >
> > > > > > >        *Drawback:* Polaris would need to manage the secrets.
> This is
> > > > > > easy to
> > > > > > >        support for certain authentication types (LDAP/Simple),
> > > > However,
> > > > > > >   it would
> > > > > > >        preclude the support for other authentication
> mechanisms, such
> > > > > > > as Kerberos
> > > > > > >        or Custom.
> > > > > > >
> > > > > > > I prefer option 2a primarily because it provides the
> flexibility of
> > > > > > > supporting multiple federated Hive catalogs while allowing
> Polaris to
> > > > > > > support authentication that it does not natively manage.
> Please let
> > > > me
> > > > > > know
> > > > > > > if you have any thoughts or feedback.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Pooja
> > > > > > >
> > > > > > --
> > > > > > Robert Stupp
> > > > > > @snazy
> > > > > >
> > > > > >
> > > > >
> > > >
>

Reply via email to