sumeetgajjar commented on code in PR #4942:
URL: https://github.com/apache/iceberg/pull/4942#discussion_r890266393
##########
spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBaseWithCatalog.java:
##########
@@ -86,7 +86,7 @@ public SparkTestBaseWithCatalog(String catalogName, String
implementation, Map<S
this.tableName = (catalogName.equals("spark_catalog") ? "" : catalogName +
".") + "default.table";
- sql("CREATE NAMESPACE IF NOT EXISTS default");
+ createNamespace(validationNamespaceCatalog, Namespace.of("default"));
Review Comment:
Hi Ryan - thank you for your comment.
> Why is this needed? IF NOT EXISTS should check existence first, right?
Yes - in a way it does perform the check.
When "CREATE NAMESPACE IF NOT EXISTS default" SQL command is executed in
Spark, Spark invokes hive.createDatabase command.
https://github.com/apache/spark/blob/89fdb8a6fb6a669c458891b3abeba236e64b1e89/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala#L574
Hive client invokes internally invokes HMS API to create the database. If
the DB already exists HMS throws AlreadyExistsException. When the ifNotExist
flag is set to true, the Hive client simply ignores the exception.
https://github.com/apache/hive/blob/63326ff775206e59547b6b1332e25279e90ef5ee/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L608-L619
The HMS logs this exception to STDERR and for iceberg tests since a
standalone HMS is running in the same JVM as that of the test, these logs are
part of the info output of the tests.
This generates a lot of noise in the logs and might overshadow an actual
exception.
Thus in order to avoid the `AlreadyExistsException` in the info logs, we are
performing an explicit check and using the Catalog APIs instead of the Spark
SQL to create a namespace.
##########
spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBaseWithCatalog.java:
##########
@@ -86,7 +86,7 @@ public SparkTestBaseWithCatalog(String catalogName, String
implementation, Map<S
this.tableName = (catalogName.equals("spark_catalog") ? "" : catalogName +
".") + "default.table";
- sql("CREATE NAMESPACE IF NOT EXISTS default");
+ createNamespace(validationNamespaceCatalog, Namespace.of("default"));
Review Comment:
Hi @rdblue - thank you for your comment.
> Why is this needed? IF NOT EXISTS should check existence first, right?
Yes - in a way it does perform the check.
When "CREATE NAMESPACE IF NOT EXISTS default" SQL command is executed in
Spark, Spark invokes hive.createDatabase command.
https://github.com/apache/spark/blob/89fdb8a6fb6a669c458891b3abeba236e64b1e89/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala#L574
Hive client invokes internally invokes HMS API to create the database. If
the DB already exists HMS throws AlreadyExistsException. When the ifNotExist
flag is set to true, the Hive client simply ignores the exception.
https://github.com/apache/hive/blob/63326ff775206e59547b6b1332e25279e90ef5ee/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L608-L619
The HMS logs this exception to STDERR and for iceberg tests since a
standalone HMS is running in the same JVM as that of the test, these logs are
part of the info output of the tests.
This generates a lot of noise in the logs and might overshadow an actual
exception.
Thus in order to avoid the `AlreadyExistsException` in the info logs, we are
performing an explicit check and using the Catalog APIs instead of the Spark
SQL to create a namespace.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]