wypoon commented on pull request #4017: URL: https://github.com/apache/iceberg/pull/4017#issuecomment-1032165351
@rdblue I realize now that you view this issue differently than I do. You view engine.hive.enabled as a property of the Hive environment, and see it as related to Iceberg classes being available on the classpath. I view engine.hive.enabled as a property of the table, as I view whether or not a table is Hive-enabled to be a trait of the table rather than of the Hive environment. Thus I do not see having the property in the table as "leaking Hive environment". Thus my proposal to write engine.hive.enabled in the table properties when it is created (not "from some point in time") if it is created a being Hive-enabled. From what I observe, as a Spark user, I have been able to create, read and write Hive-enabled Iceberg tables without having the Hive Iceberg StorageHandler, SerDe, InputFormat and OutputFormat classes in my classpath; I only need the Iceberg Spark runtime jar in my classpath. So I don't think of whether a table is Hive-enabled or not as tied to my environment and classpath. The way we encountered this issue is in a setting where we have a shared HMS, and where I create an Iceberg table using Spark (with iceberg.engine.hive.enabled=true in my conf, as we want to create Hive-enabled tables for interop with Hive). There is no Hive on this cluster. In a different cluster, we have Hive. iceberg.engine.hive.enabled is not set in the environment of this Hive cluster (so it defaults to false), as there did not seem to be a need for it. @pvary or @mbod can speak to the Hive side, since I'm not knowledgeable about it, but I believe that when you create an Iceberg table using Hive QL, then Hive creates it with engine.hive.enabled=true in the properties. It's just that when Spark created the table, it doesn't write engine.hive.enabled=true in the properties (which is what this change fixes). Hive can read the Iceberg table Spark created (because HMS says it has the Iceberg StorageHandler, SerDe, Input/OutputFormat), but if it updates it, in `HiveTableOperations# doCommit`, `hiveEngineEnabled` will be false, so the table is turned into a non-Hive-enabled table (the table will then no longer have the Iceberg StorageHandler, SerDe, Input/OutputFormat), and Hive won't be able to read it after that. We could also have a situation where Spark, in another cluster where iceberg.engine.hive.enabled is not set in the conf), updates the table, and that would also turn a Hive-enabled table into a non-Hive-enabled table if we don't have engine.hive.enabled=true in the table properties. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org