rdblue commented on pull request #1783: URL: https://github.com/apache/iceberg/pull/1783#issuecomment-741929560
I think that the catalog registration here is good. I don't see a way around registering a catalog, so I think we should do it and also use that to maintain compatibility. So here's what I think behavior should be: 1. Always register a default Iceberg catalog that is a HiveCatalog using the HMS URI from hive-site.xml, like what you've done. 2. Use the `/` check for paths. If the table ref is a path, then load it from the default Iceberg catalog (any catalog works). 3. If the table ref has a catalog, use that catalog (if it doesn't support Iceberg, that's fine because we can't guess or replace it) 4. If the table ref does not have a catalog, use the current catalog 5. If the current catalog is the session catalog and is not an `IcebergSessionCatalog`, then replace it with the default Iceberg catalog 6. If the current catalog is not the session catalog, use it even if it doesn't support Iceberg That keeps behavior identical and also follows Spark's rules for identifiers. The only time that we use the default Iceberg catalog for a Hive identifier is if there is nothing new in Spark 3 that overrides the behavior to use a different catalog, and if the session catalog (equivalent to the default in 2.4) can't handle Iceberg. Optionally, paths could be loaded from the session catalog if it is an `IcebergSessionCatalog`. What do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
