I think having some integration with HMS is definitely a good idea. We've already seen users build this in the wild on top of Polaris showing that there is definitely a demand. I'm still a strong believer that we should be helping users get to Polaris from whatever systems they are currently using to Polaris.
On Mon, Jul 7, 2025 at 12:59 PM Eric Maynard <eric.w.mayn...@gmail.com> wrote: > 1. We (Polaris) can provide end users a way to migrate off of these > catalogs that the Iceberg project no longer wants to invest into. > Implementing HMS federation is in service to the goal of removing > non-Iceberg catalogs, not in contradiction to it. > > 2. This does not seem like a user-centered concern, but I'm also not sure I > understand exactly what is being expressed here. Are you saying that the > current HADOOP federation does not work somehow? > > 3. Yes, please see the other thread about the IMPLICIT authentication type > for discussion of this topic. Note, however, that HMS federation may > support authentication types other than IMPLICIT. > > 4. That depends on what you mean by "depends on" -- it could also be said > that Iceberg itself depends on Hadoop. > > 5. This not only also applies to HADOOP federation, which already exists, > but also does *not* apply to HMS federation when using an authentication > mechanism other than IMPLICIT -- again, please see the other thread for > more discussion of this topic. > > On Fri, Jul 4, 2025 at 3:52 AM Robert Stupp <sn...@snazy.de> wrote: > > > I'd really prefer to not add "anything Hive" to Polaris itself, and I'd > > really like to see Hadoop being removed entirely from the Polaris code > > base. > > > > There are multiple reasons for this: > > > > 1. The Iceberg project would rather like to remove all catalogs except > > the REST catalog. (That's at least what I understood from discussions > > quite a while ago.) > > > > 2. Hadoop is quite behind supporting recent Java versions. It is already > > impossible to run "anything Hadoop" with Java 24. Considering how long > > it took Hadoop to even support Java 11, it will take a long time until > > Hadoop is ready for Java 24+, especially since Hadoop has to refactor a > > lot of things. Polaris requires Java 21 and we know it works in CI with > > Java 22+23 (both are EOL). Hadoop does only support Java 11, not 17, not > > 21. > > > > 3. Hadoop (HDFS) is as a very different security model, which is the > > reason why HDFS is not suitable for Polaris production configuration, > > guarded by explicit configuration options. > > > > 4. Hive depends on Hadoop, so all concerns about Hadoop also apply to > Hive. > > > > 5. Polaris is multi-tenant (realms). A _single_ instance of Hive > > contradicts this. > > > > > > My vote would be on *not* adding Hive and also on removing Hadoop > entirely. > > > > If someone comes up with an Iceberg REST catalog for Hive or HDFS and > > Polaris can connect to it, that's fine for me, because it's outside of > > Polaris. But I strongly object having Hadoop or even Hive in Polaris. > > > > > > On 7/1/25 20:48, Pooja Nilangekar wrote: > > > Hi all, > > > > > > I wanted to start a discussion around the support for Hive Catalog > > > federation in Polaris. In particular, there are two primary ways we can > > add > > > support for Hive federation: > > > > > > *1. Support a single Hive instance per Polaris deployment* The Hive > > > workflow would be identical to the Hadoop catalog workflow. Polaris > > > would invoke the Iceberg connection library, that would try to find the > > > hive-site.xml file in (1) the CLASSPATH and (2) the default Hadoop > > > locations: HADOOP_PATH and HADOOP_CONF_DIR. Polaris would then > initialize > > > the Hive connection using the configurations it found at these > locations. > > > > > > - > > > > > > *Drawbacks: *The primary drawback of this approach is that if > Polaris > > > finds multiple hive-site.xml files, it would merge their > > configurations, > > > which could lead to potentially inconsistent connection state. > > > Furthermore, there is no clear documentation of the order in which > > the > > > configuration would be applied. While this is often predictable on > a > > given > > > OS, it is not guaranteed across environments. The other key > drawback > > is > > > that if a Polaris user wants to federate to multiple Hive catalogs, > > their > > > only option is to deploy a separate Polaris instance for each Hive > > > instance. > > > > > > *2. Support multiple Hive instances per Polaris deployment* The > alternate > > > (and in my view, ideal) solution is to allow Polaris to federate with > > > multiple Hive catalogs. To support multiple catalogs, Polaris would > > > explicitly disallow the connection library from reading hive-site.xml > > files > > > in the default paths. To pass in the configurations, Polaris can adopt > > one > > > of two options: > > > > > > - > > > > > > *Option 2a: Accept a canonical path to the target hive-site.xml.* > > > - > > > > > > *Advantages:* This guarantees that the connection configurations > > are > > > derived from a single source. It also allows Polaris to rely on > > the > > > NONE/ENVIRONMENT/PROVIDER/UNMANAGED mechanism, making it > > especially > > > useful in case the Hive instance relies on Kerberos or custom > > > authentication that Polaris does not natively support/manage. > > > - > > > > > > *Drawbacks:* The user needs to have access (or some mechanism to > > > upload files) to the Polaris server's file system. > > > - > > > > > > *Option 2b: Accept all the connection-specific parameters as a part > > of > > > the create-catalog request.* > > > - > > > > > > *Advantage:* Polaris can directly accept and store the > > configurations > > > in a DPO instead of relying on the user having access to the > > > server's file > > > system (to create/update hive-site.xml). > > > - > > > > > > *Drawback:* Polaris would need to manage the secrets. This is > > easy to > > > support for certain authentication types (LDAP/Simple), However, > > > it would > > > preclude the support for other authentication mechanisms, such > > > as Kerberos > > > or Custom. > > > > > > I prefer option 2a primarily because it provides the flexibility of > > > supporting multiple federated Hive catalogs while allowing Polaris to > > > support authentication that it does not natively manage. Please let me > > know > > > if you have any thoughts or feedback. > > > > > > Thanks, > > > Pooja > > > > > -- > > Robert Stupp > > @snazy > > > > >