I think this is orthogonal to Dmitri's note above but my understanding is
the following:

When the auth type is NONE, the config doesn't tell Polaris which (or any)
hive-site.xml to pick up. This is essentially just a test mode or a mode
for when the Polaris service is deployed as a wrapper around a specific
Hive catalog. In this case, users 2/3 above are the same person and nobody
is creating new EXTERNAL catalogs. There is accordingly no need to support
multiple hive-site.xml files.

When the auth type is non-NONE, the connection/auth objects should contain
all the information sufficient to connect to a remote Hive catalog *without*
 *any* hive-site.xml. Indeed, we should try to avoid accidentally picking
up a hive-site.xml on the classpath in this case as it could be dangerous
to do so without the admin being aware. In this case, users 2/3 are
different people with different levels of trust.

--EM

On Wed, Jul 2, 2025 at 1:55 PM Pooja Nilangekar <po...@apache.org> wrote:

> Hey Eric,
>
> The part about other auth types has a few aspects we should consider:
>
> 1. We can reliably support HMS instances that have (Simple, ie. NO_AUTH)
> and LDAP (with some implementation of LDAPAuthenticationType).
> 2. We can't support Kerberos because it requires that the user via the
> Polaris server registers themselves with the KDC server and stores the
> key-tab file at some location on the server.
> 3. With other Custom implementation, Polaris has no way of knowing the
> authentication type and type of secrets. So we really can't implement
> support for it.
>
> Regarding Option 1, how do you propose we handle the issue of encountering
> multiple hive-site.xml files? The problem is that unfortunately we don't
> have a clear mechanism to determine all the files that were read to create
> the final configuration used for a connection.
>
> Thanks,
> Pooja
>
>
> On 2025/07/02 18:05:54 Eric Maynard wrote:
> > IMO 2a should be off the table; the person creating an EXTERNAL catalog
> > does not necessarily have permission to access the path that a
> > hive-site.xml is as (whether local to the Polaris catalog service or in
> > object storage) or to even know what paths the catalog has access to.
> It's
> > the same problem as with HADOOP federation.
> >
> > In the NONE authentication scheme, I think we are stuck with Option 1
> just
> > like HADOOP.
> >
> > However, if we can implement other auth types that work with 2b, that
> > should be preferred.
> >
> > In summary, if the auth scheme is NONE (and the admin has enabled this) I
> > think it's okay to let the HiveCatalog pick up the default hive-site.xml
> > that the env vars on the Polaris service points at. In a production-ready
> > scenario where the auth scheme is non-none, that auth scheme should tell
> > the Polaris service how to connect to and authenticate itself to a remote
> > Hive catalog.
> >
> > --EM
> >
> > On Tue, Jul 1, 2025 at 11:49 AM Pooja Nilangekar <
> nilangekar.po...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > I wanted to start a discussion around the support for Hive Catalog
> > > federation in Polaris. In particular, there are two primary ways we
> can add
> > > support for Hive federation:
> > >
> > > *1. Support a single Hive instance per Polaris deployment* The Hive
> > > workflow would be identical to the Hadoop catalog workflow. Polaris
> > > would invoke the Iceberg connection library, that would try to find the
> > > hive-site.xml file in (1) the CLASSPATH and (2) the default Hadoop
> > > locations: HADOOP_PATH and HADOOP_CONF_DIR. Polaris would then
> initialize
> > > the Hive connection using the configurations it found at these
> locations.
> > >
> > >    -
> > >
> > >    *Drawbacks: *The primary drawback of this approach is that if
> Polaris
> > >    finds multiple hive-site.xml files, it would merge their
> configurations,
> > >    which could lead to potentially inconsistent connection state.
> > >    Furthermore, there is no clear documentation of the order in which
> the
> > >    configuration would be applied. While this is often predictable on a
> > > given
> > >    OS, it is not guaranteed across environments. The other key
> drawback is
> > >    that if a Polaris user wants to federate to multiple Hive catalogs,
> > > their
> > >    only option is to deploy a separate Polaris instance for each Hive
> > >    instance.
> > >
> > > *2. Support multiple Hive instances per Polaris deployment* The
> alternate
> > > (and in my view, ideal) solution is to allow Polaris to federate with
> > > multiple Hive catalogs. To support multiple catalogs, Polaris would
> > > explicitly disallow the connection library from reading hive-site.xml
> files
> > > in the default paths. To pass in the configurations, Polaris can adopt
> one
> > > of two options:
> > >
> > >    -
> > >
> > >    *Option 2a: Accept a canonical path to the target hive-site.xml.*
> > >    -
> > >
> > >       *Advantages:* This guarantees that the connection configurations
> are
> > >       derived from a single source. It also allows Polaris to rely on
> the
> > >       NONE/ENVIRONMENT/PROVIDER/UNMANAGED mechanism, making it
> especially
> > >       useful in case the Hive instance relies on Kerberos or custom
> > >       authentication that Polaris does not natively support/manage.
> > >       -
> > >
> > >       *Drawbacks:* The user needs to have access (or some mechanism to
> > >       upload files) to the Polaris server's file system.
> > >       -
> > >
> > >    *Option 2b: Accept all the connection-specific parameters as a part
> of
> > >    the create-catalog request.*
> > >    -
> > >
> > >       *Advantage:* Polaris can directly accept and store the
> configurations
> > >       in a DPO instead of relying on the user having access to the
> > > server's file
> > >       system (to create/update hive-site.xml).
> > >       -
> > >
> > >       *Drawback:* Polaris would need to manage the secrets. This is
> easy to
> > >       support for certain authentication types (LDAP/Simple), However,
> > >  it would
> > >       preclude the support for other authentication mechanisms, such
> > > as Kerberos
> > >       or Custom.
> > >
> > > I prefer option 2a primarily because it provides the flexibility of
> > > supporting multiple federated Hive catalogs while allowing Polaris to
> > > support authentication that it does not natively manage. Please let me
> know
> > > if you have any thoughts or feedback.
> > >
> > > Thanks,
> > > Pooja
> > >
> >
>

Reply via email to