kbendick commented on a change in pull request #3126:
URL: https://github.com/apache/iceberg/pull/3126#discussion_r712361830
##########
File path:
spark3/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java
##########
@@ -240,8 +245,38 @@ public void renameTable(Identifier from, Identifier to)
throws NoSuchTableExcept
}
}
+ public boolean isHiveCatalogConfigValid(CaseInsensitiveStringMap options,
StringBuilder errorMsg) {
+ String hadoopConfUri =
SparkSession.active().sparkContext().conf().contains("spark.hadoop.hive.metastore.uris")
?
+
SparkSession.active().sparkContext().conf().get("spark.hadoop.hive.metastore.uris")
: null;
+ String catalogConfUri = options.get("uri");
+
+ if (hadoopConfUri == null) {
+ errorMsg.append("Hive uri config is missing");
Review comment:
Question: So we're saying that if `spark.hadoop.hive.metastore.uris` is
not set, even if the user has set the uri on the
`spark.sql.catalog.spark_catalog.uri` (and only there for instance), that we're
considering that an error?
I personally don't see that as an error case, if only so we don't break the
potential existing jobs out there that don't have the metastore uri set only
via the iceberg spark session catalog.
I think the only error case should be if the two values don't match, and
that otherwise we should allow:
1) Metastore URI is not set on the SparkConf and is set on the catalog `uri`
field -> Log a warning, but allow it (as I do believe this works presently).
2) Metastore URI is set on both the SparkConf and on the SparkSessionCatalog
`uri` field -> Either log a warning if both values are the same, or error out
if the values are not the same.
3) Metastore URI is set on the SparkConf, but not set in the
SparkSessionCatalog `uri` field -> Continue as normal (happy path). No
additional logs.
##########
File path:
spark3/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java
##########
@@ -240,8 +245,38 @@ public void renameTable(Identifier from, Identifier to)
throws NoSuchTableExcept
}
}
+ public boolean isHiveCatalogConfigValid(CaseInsensitiveStringMap options,
StringBuilder errorMsg) {
+ String hadoopConfUri =
SparkSession.active().sparkContext().conf().contains("spark.hadoop.hive.metastore.uris")
?
+
SparkSession.active().sparkContext().conf().get("spark.hadoop.hive.metastore.uris")
: null;
+ String catalogConfUri = options.get("uri");
+
+ if (hadoopConfUri == null) {
+ errorMsg.append("Hive uri config is missing");
+ } else if (catalogConfUri != null) {
+ LOG.warn("Don't set uri for SparkSessionCatalog" +
+ "set it only in hive conf");
Review comment:
Nit: This log message is going to be missing a space between
`SparkSessionCatalog` and `set`.
##########
File path:
spark3/src/test/java/org/apache/iceberg/actions/TestRemoveOrphanFilesAction3.java
##########
@@ -134,6 +135,24 @@ public void testSparkSessionCatalogHadoopTable() throws
Exception {
results.contains("file:" + location + "/data/trashfile"));
}
+ @Test
+ public void testSparkSessionCatalogHiveWrongConfig() {
+ spark.conf().set("spark.sql.catalog.spark_catalog",
"org.apache.iceberg.spark.SparkSessionCatalog");
+ spark.conf().set("spark.sql.catalog.spark_catalog.type", "hive");
+ spark.conf().set("spark.sql.catalog.spark_catalog.uri",
"thrift://localhost:9083");
+ if (spark.version().equalsIgnoreCase("3.0.3")) {
Review comment:
I think that this can just check for
`spark.version().startsWith("3.0"))`.
If I'm not mistaken, I think that all of Spark 3.0.x (and not just 3.0.3)
will have an issue.
##########
File path:
spark3/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java
##########
@@ -240,8 +245,38 @@ public void renameTable(Identifier from, Identifier to)
throws NoSuchTableExcept
}
}
+ public boolean isHiveCatalogConfigValid(CaseInsensitiveStringMap options,
StringBuilder errorMsg) {
+ String hadoopConfUri =
SparkSession.active().sparkContext().conf().contains("spark.hadoop.hive.metastore.uris")
?
+
SparkSession.active().sparkContext().conf().get("spark.hadoop.hive.metastore.uris")
: null;
+ String catalogConfUri = options.get("uri");
+
+ if (hadoopConfUri == null) {
+ errorMsg.append("Hive uri config is missing");
+ } else if (catalogConfUri != null) {
+ LOG.warn("Don't set uri for SparkSessionCatalog" +
+ "set it only in hive conf");
+ if (!hadoopConfUri.equalsIgnoreCase(catalogConfUri)) {
+ errorMsg.append("Cannot set uri for SparkSessionCatalog: " +
+ "conflicts with Hive conf URI");
Review comment:
Nit: I think it would be more beneficial to the user if the error
message contains the two conflicting values, indicating what value was set for
the catalog's `uri` field and what value was set via the traditional mechanisms
available to spark.
##########
File path:
spark3/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java
##########
@@ -240,8 +245,38 @@ public void renameTable(Identifier from, Identifier to)
throws NoSuchTableExcept
}
}
+ public boolean isHiveCatalogConfigValid(CaseInsensitiveStringMap options,
StringBuilder errorMsg) {
+ String hadoopConfUri =
SparkSession.active().sparkContext().conf().contains("spark.hadoop.hive.metastore.uris")
?
+
SparkSession.active().sparkContext().conf().get("spark.hadoop.hive.metastore.uris")
: null;
+ String catalogConfUri = options.get("uri");
+
+ if (hadoopConfUri == null) {
+ errorMsg.append("Hive uri config is missing");
+ } else if (catalogConfUri != null) {
+ LOG.warn("Don't set uri for SparkSessionCatalog" +
+ "set it only in hive conf");
Review comment:
Nit: Hive should be capitalized. Please be sure to be consistent when
capitalizing `Hive` 🙂
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]