I think having some integration with HMS is definitely a good idea. We've
already seen
users build this in the wild on top of Polaris showing that there is
definitely a demand.
 I'm still a strong believer that we should be helping users get to Polaris
from whatever systems
they are currently using to Polaris.

On Mon, Jul 7, 2025 at 12:59 PM Eric Maynard <eric.w.mayn...@gmail.com>
wrote:

> 1. We (Polaris) can provide end users a way to migrate off of these
> catalogs that the Iceberg project no longer wants to invest into.
> Implementing HMS federation is in service to the goal of removing
> non-Iceberg catalogs, not in contradiction to it.
>
> 2. This does not seem like a user-centered concern, but I'm also not sure I
> understand exactly what is being expressed here. Are you saying that the
> current HADOOP federation does not work somehow?
>
> 3. Yes, please see the other thread about the IMPLICIT authentication type
> for discussion of this topic. Note, however, that HMS federation may
> support authentication types other than IMPLICIT.
>
> 4. That depends on what you mean by "depends on" -- it could also be said
> that Iceberg itself depends on Hadoop.
>
> 5. This not only also applies to HADOOP federation, which already exists,
> but also does *not* apply to HMS federation when using an authentication
> mechanism other than IMPLICIT -- again, please see the other thread for
> more discussion of this topic.
>
> On Fri, Jul 4, 2025 at 3:52 AM Robert Stupp <sn...@snazy.de> wrote:
>
> > I'd really prefer to not add "anything Hive" to Polaris itself, and I'd
> > really like to see Hadoop being removed entirely from the Polaris code
> > base.
> >
> > There are multiple reasons for this:
> >
> > 1. The Iceberg project would rather like to remove all catalogs except
> > the REST catalog. (That's at least what I understood from discussions
> > quite a while ago.)
> >
> > 2. Hadoop is quite behind supporting recent Java versions. It is already
> > impossible to run "anything Hadoop" with Java 24. Considering how long
> > it took Hadoop to even support Java 11, it will take a long time until
> > Hadoop is ready for Java 24+, especially since Hadoop has to refactor a
> > lot of things. Polaris requires Java 21 and we know it works in CI with
> > Java 22+23 (both are EOL). Hadoop does only support Java 11, not 17, not
> > 21.
> >
> > 3. Hadoop (HDFS) is as a very different security model, which is the
> > reason why HDFS is not suitable for Polaris production configuration,
> > guarded by explicit configuration options.
> >
> > 4. Hive depends on Hadoop, so all concerns about Hadoop also apply to
> Hive.
> >
> > 5. Polaris is multi-tenant (realms). A _single_ instance of Hive
> > contradicts this.
> >
> >
> > My vote would be on *not* adding Hive and also on removing Hadoop
> entirely.
> >
> > If someone comes up with an Iceberg REST catalog for Hive or HDFS and
> > Polaris can connect to it, that's fine for me, because it's outside of
> > Polaris. But I strongly object having Hadoop or even Hive in Polaris.
> >
> >
> > On 7/1/25 20:48, Pooja Nilangekar wrote:
> > > Hi all,
> > >
> > > I wanted to start a discussion around the support for Hive Catalog
> > > federation in Polaris. In particular, there are two primary ways we can
> > add
> > > support for Hive federation:
> > >
> > > *1. Support a single Hive instance per Polaris deployment* The Hive
> > > workflow would be identical to the Hadoop catalog workflow. Polaris
> > > would invoke the Iceberg connection library, that would try to find the
> > > hive-site.xml file in (1) the CLASSPATH and (2) the default Hadoop
> > > locations: HADOOP_PATH and HADOOP_CONF_DIR. Polaris would then
> initialize
> > > the Hive connection using the configurations it found at these
> locations.
> > >
> > >     -
> > >
> > >     *Drawbacks: *The primary drawback of this approach is that if
> Polaris
> > >     finds multiple hive-site.xml files, it would merge their
> > configurations,
> > >     which could lead to potentially inconsistent connection state.
> > >     Furthermore, there is no clear documentation of the order in which
> > the
> > >     configuration would be applied. While this is often predictable on
> a
> > given
> > >     OS, it is not guaranteed across environments. The other key
> drawback
> > is
> > >     that if a Polaris user wants to federate to multiple Hive catalogs,
> > their
> > >     only option is to deploy a separate Polaris instance for each Hive
> > >     instance.
> > >
> > > *2. Support multiple Hive instances per Polaris deployment* The
> alternate
> > > (and in my view, ideal) solution is to allow Polaris to federate with
> > > multiple Hive catalogs. To support multiple catalogs, Polaris would
> > > explicitly disallow the connection library from reading hive-site.xml
> > files
> > > in the default paths. To pass in the configurations, Polaris can adopt
> > one
> > > of two options:
> > >
> > >     -
> > >
> > >     *Option 2a: Accept a canonical path to the target hive-site.xml.*
> > >     -
> > >
> > >        *Advantages:* This guarantees that the connection configurations
> > are
> > >        derived from a single source. It also allows Polaris to rely on
> > the
> > >        NONE/ENVIRONMENT/PROVIDER/UNMANAGED mechanism, making it
> > especially
> > >        useful in case the Hive instance relies on Kerberos or custom
> > >        authentication that Polaris does not natively support/manage.
> > >        -
> > >
> > >        *Drawbacks:* The user needs to have access (or some mechanism to
> > >        upload files) to the Polaris server's file system.
> > >        -
> > >
> > >     *Option 2b: Accept all the connection-specific parameters as a part
> > of
> > >     the create-catalog request.*
> > >     -
> > >
> > >        *Advantage:* Polaris can directly accept and store the
> > configurations
> > >        in a DPO instead of relying on the user having access to the
> > > server's file
> > >        system (to create/update hive-site.xml).
> > >        -
> > >
> > >        *Drawback:* Polaris would need to manage the secrets. This is
> > easy to
> > >        support for certain authentication types (LDAP/Simple), However,
> > >   it would
> > >        preclude the support for other authentication mechanisms, such
> > > as Kerberos
> > >        or Custom.
> > >
> > > I prefer option 2a primarily because it provides the flexibility of
> > > supporting multiple federated Hive catalogs while allowing Polaris to
> > > support authentication that it does not natively manage. Please let me
> > know
> > > if you have any thoughts or feedback.
> > >
> > > Thanks,
> > > Pooja
> > >
> > --
> > Robert Stupp
> > @snazy
> >
> >
>

Reply via email to