It’s a valid point that Polaris needs to support multi-tenancy, or even across different external catalogs (such as remote HMS) within a single realm.
Unfortunately, Kerberos isn’t compatible with this model, as it requires global configuration per JVM, making it inherently single-tenant. So I’d suggest we rule out the Kerberos option and explore more flexible authentication schemes. Here’s a quick summary of viable alternatives: - Implicit Authentication: This can not support multi-tenancy when environment variables are leveraged per instance. We could further extend this in the future by integrating a more sophisticated secrets manager to improve tenant isolation and credential handling. - LDAP: A well-established solution that naturally supports multiple credentials and user contexts. It aligns well with multi-tenant needs. Given that, I think we have a clear path forward for enabling multi-tenancy with HMS federation. Introducing implicit authentication as a starting point seems reasonable. It can be disabled by default, with Polaris admins choosing to enable it based on their environment. Even in single-HMS deployments, this option brings real value without adding complexity, esp. A lot of organizations only have one HMS instance. Yufei On Wed, Jul 9, 2025 at 7:44 AM Robert Stupp <sn...@snazy.de> wrote: > Following up on my email: > > Polaris would really benefit from supporting HMS and other catalog > types. And the way I see to get there is to have a "HMS only" IRC > service, which can be legibly built on Java 11, use Kerberos, etc. > Polaris can then federate to that HMS catalog. > AFAIU clients can authenticate to k8s and get OAuth tokens. Those can > be used to talk to Polaris, which can in turn talk to the HMS service. > > What I do object though is making Polaris effectively a single realm + > single catalog service and add new dependencies to Hadoop + Hive + > Kerberos to Polaris. > > On Wed, Jul 9, 2025 at 12:17 PM Robert Stupp <sn...@snazy.de> wrote: > > > > Let's recap what Polaris offers: > > 1. Multi tenancy via realms > > 2. Multiple catalogs per realm > > 3. OAuth/OIDC > > > > Adding Kerberos is global per JVM, making #1 impossible and likely > > also not suitable for #2, plus adding another complicated and complex > > auth mechanism. > > If Kerberos is a strong concern, I propose to contribute necessary > > changes to the "Iceberg auth manager project" [1] to let clients use > > krb and receive OAuth tokens for it.It is also worth mentioning that > > testing all that (development and CI including unit and especially > > integration tests) is a huge effort in itself. > > > > Again, federating to another "single tenant / single catalog HMS krb" > > Iceberg REST service behind Polaris is fine. Krb clients can authorize > > against Polaris via OAuth, and likely can Polaris itself authorize > > itself using OAuth. > > > > I strongly object to depending even more on Hadoop for the reasons > > outlined earlier. I also strongly object to adding Kerberos to > > Polaris. > > > > BTW: Hadoop is not necessary for Iceberg to work, it is rather an "opt > > in" (ex: org.apache.iceberg.hadoop.Configurable#setConf). > > > > [1] https://github.com/dremio/iceberg-auth-manager > > > > On Tue, Jul 8, 2025 at 6:25 PM Yufei Gu <flyrain...@gmail.com> wrote: > > > > > > HMS integration is a key step toward one of Polaris’s critical > missions: > > > helping users move off HMS. It brings clear value by aligning with our > > > long-term direction. > > > > > > I’m not too concerned about hive.xml, most of its configurations can be > > > dynamically injected at runtime. The real challenge lies in Kerberos > > > integration. Since krb5.conf and the keytab are globally configured per > > > JVM, a single JVM instance cannot support true multi-tenancy. As far > as I > > > know, there isn’t a clean solution to this limitation. > > > > > > If that's indeed the case, Option 2a becomes far less appealing to me. > > > > > > Yufei > > > > > > > > > On Mon, Jul 7, 2025 at 11:18 AM Russell Spitzer < > russell.spit...@gmail.com> > > > wrote: > > > > > > > I think having some integration with HMS is definitely a good idea. > We've > > > > already seen > > > > users build this in the wild on top of Polaris showing that there is > > > > definitely a demand. > > > > I'm still a strong believer that we should be helping users get to > Polaris > > > > from whatever systems > > > > they are currently using to Polaris. > > > > > > > > On Mon, Jul 7, 2025 at 12:59 PM Eric Maynard < > eric.w.mayn...@gmail.com> > > > > wrote: > > > > > > > > > 1. We (Polaris) can provide end users a way to migrate off of these > > > > > catalogs that the Iceberg project no longer wants to invest into. > > > > > Implementing HMS federation is in service to the goal of removing > > > > > non-Iceberg catalogs, not in contradiction to it. > > > > > > > > > > 2. This does not seem like a user-centered concern, but I'm also > not > > > > sure I > > > > > understand exactly what is being expressed here. Are you saying > that the > > > > > current HADOOP federation does not work somehow? > > > > > > > > > > 3. Yes, please see the other thread about the IMPLICIT > authentication > > > > type > > > > > for discussion of this topic. Note, however, that HMS federation > may > > > > > support authentication types other than IMPLICIT. > > > > > > > > > > 4. That depends on what you mean by "depends on" -- it could also > be said > > > > > that Iceberg itself depends on Hadoop. > > > > > > > > > > 5. This not only also applies to HADOOP federation, which already > exists, > > > > > but also does *not* apply to HMS federation when using an > authentication > > > > > mechanism other than IMPLICIT -- again, please see the other > thread for > > > > > more discussion of this topic. > > > > > > > > > > On Fri, Jul 4, 2025 at 3:52 AM Robert Stupp <sn...@snazy.de> > wrote: > > > > > > > > > > > I'd really prefer to not add "anything Hive" to Polaris itself, > and I'd > > > > > > really like to see Hadoop being removed entirely from the > Polaris code > > > > > > base. > > > > > > > > > > > > There are multiple reasons for this: > > > > > > > > > > > > 1. The Iceberg project would rather like to remove all catalogs > except > > > > > > the REST catalog. (That's at least what I understood from > discussions > > > > > > quite a while ago.) > > > > > > > > > > > > 2. Hadoop is quite behind supporting recent Java versions. It is > > > > already > > > > > > impossible to run "anything Hadoop" with Java 24. Considering > how long > > > > > > it took Hadoop to even support Java 11, it will take a long time > until > > > > > > Hadoop is ready for Java 24+, especially since Hadoop has to > refactor a > > > > > > lot of things. Polaris requires Java 21 and we know it works in > CI with > > > > > > Java 22+23 (both are EOL). Hadoop does only support Java 11, not > 17, > > > > not > > > > > > 21. > > > > > > > > > > > > 3. Hadoop (HDFS) is as a very different security model, which is > the > > > > > > reason why HDFS is not suitable for Polaris production > configuration, > > > > > > guarded by explicit configuration options. > > > > > > > > > > > > 4. Hive depends on Hadoop, so all concerns about Hadoop also > apply to > > > > > Hive. > > > > > > > > > > > > 5. Polaris is multi-tenant (realms). A _single_ instance of Hive > > > > > > contradicts this. > > > > > > > > > > > > > > > > > > My vote would be on *not* adding Hive and also on removing Hadoop > > > > > entirely. > > > > > > > > > > > > If someone comes up with an Iceberg REST catalog for Hive or > HDFS and > > > > > > Polaris can connect to it, that's fine for me, because it's > outside of > > > > > > Polaris. But I strongly object having Hadoop or even Hive in > Polaris. > > > > > > > > > > > > > > > > > > On 7/1/25 20:48, Pooja Nilangekar wrote: > > > > > > > Hi all, > > > > > > > > > > > > > > I wanted to start a discussion around the support for Hive > Catalog > > > > > > > federation in Polaris. In particular, there are two primary > ways we > > > > can > > > > > > add > > > > > > > support for Hive federation: > > > > > > > > > > > > > > *1. Support a single Hive instance per Polaris deployment* The > Hive > > > > > > > workflow would be identical to the Hadoop catalog workflow. > Polaris > > > > > > > would invoke the Iceberg connection library, that would try to > find > > > > the > > > > > > > hive-site.xml file in (1) the CLASSPATH and (2) the default > Hadoop > > > > > > > locations: HADOOP_PATH and HADOOP_CONF_DIR. Polaris would then > > > > > initialize > > > > > > > the Hive connection using the configurations it found at these > > > > > locations. > > > > > > > > > > > > > > - > > > > > > > > > > > > > > *Drawbacks: *The primary drawback of this approach is that > if > > > > > Polaris > > > > > > > finds multiple hive-site.xml files, it would merge their > > > > > > configurations, > > > > > > > which could lead to potentially inconsistent connection > state. > > > > > > > Furthermore, there is no clear documentation of the order > in > > > > which > > > > > > the > > > > > > > configuration would be applied. While this is often > predictable > > > > on > > > > > a > > > > > > given > > > > > > > OS, it is not guaranteed across environments. The other key > > > > > drawback > > > > > > is > > > > > > > that if a Polaris user wants to federate to multiple Hive > > > > catalogs, > > > > > > their > > > > > > > only option is to deploy a separate Polaris instance for > each > > > > Hive > > > > > > > instance. > > > > > > > > > > > > > > *2. Support multiple Hive instances per Polaris deployment* The > > > > > alternate > > > > > > > (and in my view, ideal) solution is to allow Polaris to > federate with > > > > > > > multiple Hive catalogs. To support multiple catalogs, Polaris > would > > > > > > > explicitly disallow the connection library from reading > hive-site.xml > > > > > > files > > > > > > > in the default paths. To pass in the configurations, Polaris > can > > > > adopt > > > > > > one > > > > > > > of two options: > > > > > > > > > > > > > > - > > > > > > > > > > > > > > *Option 2a: Accept a canonical path to the target > hive-site.xml.* > > > > > > > - > > > > > > > > > > > > > > *Advantages:* This guarantees that the connection > > > > configurations > > > > > > are > > > > > > > derived from a single source. It also allows Polaris to > rely > > > > on > > > > > > the > > > > > > > NONE/ENVIRONMENT/PROVIDER/UNMANAGED mechanism, making it > > > > > > especially > > > > > > > useful in case the Hive instance relies on Kerberos or > custom > > > > > > > authentication that Polaris does not natively > support/manage. > > > > > > > - > > > > > > > > > > > > > > *Drawbacks:* The user needs to have access (or some > mechanism > > > > to > > > > > > > upload files) to the Polaris server's file system. > > > > > > > - > > > > > > > > > > > > > > *Option 2b: Accept all the connection-specific parameters > as a > > > > part > > > > > > of > > > > > > > the create-catalog request.* > > > > > > > - > > > > > > > > > > > > > > *Advantage:* Polaris can directly accept and store the > > > > > > configurations > > > > > > > in a DPO instead of relying on the user having access > to the > > > > > > > server's file > > > > > > > system (to create/update hive-site.xml). > > > > > > > - > > > > > > > > > > > > > > *Drawback:* Polaris would need to manage the secrets. > This is > > > > > > easy to > > > > > > > support for certain authentication types (LDAP/Simple), > > > > However, > > > > > > > it would > > > > > > > preclude the support for other authentication > mechanisms, such > > > > > > > as Kerberos > > > > > > > or Custom. > > > > > > > > > > > > > > I prefer option 2a primarily because it provides the > flexibility of > > > > > > > supporting multiple federated Hive catalogs while allowing > Polaris to > > > > > > > support authentication that it does not natively manage. > Please let > > > > me > > > > > > know > > > > > > > if you have any thoughts or feedback. > > > > > > > > > > > > > > Thanks, > > > > > > > Pooja > > > > > > > > > > > > > -- > > > > > > Robert Stupp > > > > > > @snazy > > > > > > > > > > > > > > > > > > > > > >