I think this is orthogonal to Dmitri's note above but my understanding is the following:
When the auth type is NONE, the config doesn't tell Polaris which (or any) hive-site.xml to pick up. This is essentially just a test mode or a mode for when the Polaris service is deployed as a wrapper around a specific Hive catalog. In this case, users 2/3 above are the same person and nobody is creating new EXTERNAL catalogs. There is accordingly no need to support multiple hive-site.xml files. When the auth type is non-NONE, the connection/auth objects should contain all the information sufficient to connect to a remote Hive catalog *without* *any* hive-site.xml. Indeed, we should try to avoid accidentally picking up a hive-site.xml on the classpath in this case as it could be dangerous to do so without the admin being aware. In this case, users 2/3 are different people with different levels of trust. --EM On Wed, Jul 2, 2025 at 1:55 PM Pooja Nilangekar <po...@apache.org> wrote: > Hey Eric, > > The part about other auth types has a few aspects we should consider: > > 1. We can reliably support HMS instances that have (Simple, ie. NO_AUTH) > and LDAP (with some implementation of LDAPAuthenticationType). > 2. We can't support Kerberos because it requires that the user via the > Polaris server registers themselves with the KDC server and stores the > key-tab file at some location on the server. > 3. With other Custom implementation, Polaris has no way of knowing the > authentication type and type of secrets. So we really can't implement > support for it. > > Regarding Option 1, how do you propose we handle the issue of encountering > multiple hive-site.xml files? The problem is that unfortunately we don't > have a clear mechanism to determine all the files that were read to create > the final configuration used for a connection. > > Thanks, > Pooja > > > On 2025/07/02 18:05:54 Eric Maynard wrote: > > IMO 2a should be off the table; the person creating an EXTERNAL catalog > > does not necessarily have permission to access the path that a > > hive-site.xml is as (whether local to the Polaris catalog service or in > > object storage) or to even know what paths the catalog has access to. > It's > > the same problem as with HADOOP federation. > > > > In the NONE authentication scheme, I think we are stuck with Option 1 > just > > like HADOOP. > > > > However, if we can implement other auth types that work with 2b, that > > should be preferred. > > > > In summary, if the auth scheme is NONE (and the admin has enabled this) I > > think it's okay to let the HiveCatalog pick up the default hive-site.xml > > that the env vars on the Polaris service points at. In a production-ready > > scenario where the auth scheme is non-none, that auth scheme should tell > > the Polaris service how to connect to and authenticate itself to a remote > > Hive catalog. > > > > --EM > > > > On Tue, Jul 1, 2025 at 11:49 AM Pooja Nilangekar < > nilangekar.po...@gmail.com> > > wrote: > > > > > Hi all, > > > > > > I wanted to start a discussion around the support for Hive Catalog > > > federation in Polaris. In particular, there are two primary ways we > can add > > > support for Hive federation: > > > > > > *1. Support a single Hive instance per Polaris deployment* The Hive > > > workflow would be identical to the Hadoop catalog workflow. Polaris > > > would invoke the Iceberg connection library, that would try to find the > > > hive-site.xml file in (1) the CLASSPATH and (2) the default Hadoop > > > locations: HADOOP_PATH and HADOOP_CONF_DIR. Polaris would then > initialize > > > the Hive connection using the configurations it found at these > locations. > > > > > > - > > > > > > *Drawbacks: *The primary drawback of this approach is that if > Polaris > > > finds multiple hive-site.xml files, it would merge their > configurations, > > > which could lead to potentially inconsistent connection state. > > > Furthermore, there is no clear documentation of the order in which > the > > > configuration would be applied. While this is often predictable on a > > > given > > > OS, it is not guaranteed across environments. The other key > drawback is > > > that if a Polaris user wants to federate to multiple Hive catalogs, > > > their > > > only option is to deploy a separate Polaris instance for each Hive > > > instance. > > > > > > *2. Support multiple Hive instances per Polaris deployment* The > alternate > > > (and in my view, ideal) solution is to allow Polaris to federate with > > > multiple Hive catalogs. To support multiple catalogs, Polaris would > > > explicitly disallow the connection library from reading hive-site.xml > files > > > in the default paths. To pass in the configurations, Polaris can adopt > one > > > of two options: > > > > > > - > > > > > > *Option 2a: Accept a canonical path to the target hive-site.xml.* > > > - > > > > > > *Advantages:* This guarantees that the connection configurations > are > > > derived from a single source. It also allows Polaris to rely on > the > > > NONE/ENVIRONMENT/PROVIDER/UNMANAGED mechanism, making it > especially > > > useful in case the Hive instance relies on Kerberos or custom > > > authentication that Polaris does not natively support/manage. > > > - > > > > > > *Drawbacks:* The user needs to have access (or some mechanism to > > > upload files) to the Polaris server's file system. > > > - > > > > > > *Option 2b: Accept all the connection-specific parameters as a part > of > > > the create-catalog request.* > > > - > > > > > > *Advantage:* Polaris can directly accept and store the > configurations > > > in a DPO instead of relying on the user having access to the > > > server's file > > > system (to create/update hive-site.xml). > > > - > > > > > > *Drawback:* Polaris would need to manage the secrets. This is > easy to > > > support for certain authentication types (LDAP/Simple), However, > > > it would > > > preclude the support for other authentication mechanisms, such > > > as Kerberos > > > or Custom. > > > > > > I prefer option 2a primarily because it provides the flexibility of > > > supporting multiple federated Hive catalogs while allowing Polaris to > > > support authentication that it does not natively manage. Please let me > know > > > if you have any thoughts or feedback. > > > > > > Thanks, > > > Pooja > > > > > >