IMO 2a should be off the table; the person creating an EXTERNAL catalog
does not necessarily have permission to access the path that a
hive-site.xml is as (whether local to the Polaris catalog service or in
object storage) or to even know what paths the catalog has access to. It's
the same problem as with HADOOP federation.

In the NONE authentication scheme, I think we are stuck with Option 1 just
like HADOOP.

However, if we can implement other auth types that work with 2b, that
should be preferred.

In summary, if the auth scheme is NONE (and the admin has enabled this) I
think it's okay to let the HiveCatalog pick up the default hive-site.xml
that the env vars on the Polaris service points at. In a production-ready
scenario where the auth scheme is non-none, that auth scheme should tell
the Polaris service how to connect to and authenticate itself to a remote
Hive catalog.

--EM

On Tue, Jul 1, 2025 at 11:49 AM Pooja Nilangekar <nilangekar.po...@gmail.com>
wrote:

> Hi all,
>
> I wanted to start a discussion around the support for Hive Catalog
> federation in Polaris. In particular, there are two primary ways we can add
> support for Hive federation:
>
> *1. Support a single Hive instance per Polaris deployment* The Hive
> workflow would be identical to the Hadoop catalog workflow. Polaris
> would invoke the Iceberg connection library, that would try to find the
> hive-site.xml file in (1) the CLASSPATH and (2) the default Hadoop
> locations: HADOOP_PATH and HADOOP_CONF_DIR. Polaris would then initialize
> the Hive connection using the configurations it found at these locations.
>
>    -
>
>    *Drawbacks: *The primary drawback of this approach is that if Polaris
>    finds multiple hive-site.xml files, it would merge their configurations,
>    which could lead to potentially inconsistent connection state.
>    Furthermore, there is no clear documentation of the order in which the
>    configuration would be applied. While this is often predictable on a
> given
>    OS, it is not guaranteed across environments. The other key drawback is
>    that if a Polaris user wants to federate to multiple Hive catalogs,
> their
>    only option is to deploy a separate Polaris instance for each Hive
>    instance.
>
> *2. Support multiple Hive instances per Polaris deployment* The alternate
> (and in my view, ideal) solution is to allow Polaris to federate with
> multiple Hive catalogs. To support multiple catalogs, Polaris would
> explicitly disallow the connection library from reading hive-site.xml files
> in the default paths. To pass in the configurations, Polaris can adopt one
> of two options:
>
>    -
>
>    *Option 2a: Accept a canonical path to the target hive-site.xml.*
>    -
>
>       *Advantages:* This guarantees that the connection configurations are
>       derived from a single source. It also allows Polaris to rely on the
>       NONE/ENVIRONMENT/PROVIDER/UNMANAGED mechanism, making it especially
>       useful in case the Hive instance relies on Kerberos or custom
>       authentication that Polaris does not natively support/manage.
>       -
>
>       *Drawbacks:* The user needs to have access (or some mechanism to
>       upload files) to the Polaris server's file system.
>       -
>
>    *Option 2b: Accept all the connection-specific parameters as a part of
>    the create-catalog request.*
>    -
>
>       *Advantage:* Polaris can directly accept and store the configurations
>       in a DPO instead of relying on the user having access to the
> server's file
>       system (to create/update hive-site.xml).
>       -
>
>       *Drawback:* Polaris would need to manage the secrets. This is easy to
>       support for certain authentication types (LDAP/Simple), However,
>  it would
>       preclude the support for other authentication mechanisms, such
> as Kerberos
>       or Custom.
>
> I prefer option 2a primarily because it provides the flexibility of
> supporting multiple federated Hive catalogs while allowing Polaris to
> support authentication that it does not natively manage. Please let me know
> if you have any thoughts or feedback.
>
> Thanks,
> Pooja
>

Reply via email to