[GitHub] [iceberg] sumeetgajjar commented on a diff in pull request #4942: [Spark][Test]: Check before creating default namespace to avoid noisy AlreadyExistsExceptions in test logs

GitBox Mon, 06 Jun 2022 08:24:06 -0700


sumeetgajjar commented on code in PR #4942:
URL: https://github.com/apache/iceberg/pull/4942#discussion_r890266393



##########
spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBaseWithCatalog.java:
##########
@@ -86,7 +86,7 @@ public SparkTestBaseWithCatalog(String catalogName, String 
implementation, Map<S
 
     this.tableName = (catalogName.equals("spark_catalog") ? "" : catalogName + 
".") + "default.table";
 
-    sql("CREATE NAMESPACE IF NOT EXISTS default");
+    createNamespace(validationNamespaceCatalog, Namespace.of("default"));

Review Comment:
   Hi Ryan - thank you for your comment.
   
   > Why is this needed? IF NOT EXISTS should check existence first, right?
   
   Yes - in a way it does perform the check.
   
   When "CREATE NAMESPACE IF NOT EXISTS default" SQL command is executed in 
Spark, Spark invokes hive.createDatabase command.
   
https://github.com/apache/spark/blob/89fdb8a6fb6a669c458891b3abeba236e64b1e89/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala#L574
   
   Hive client invokes internally invokes HMS API to create the database. If 
the DB already exists HMS throws AlreadyExistsException. When the ifNotExist 
flag is set to true, the Hive client simply ignores the exception.
   
https://github.com/apache/hive/blob/63326ff775206e59547b6b1332e25279e90ef5ee/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L608-L619
   
   The HMS logs this exception to STDERR and for iceberg tests since a 
standalone HMS is running in the same JVM as that of the test, these logs are 
part of the info output of the tests.
   
   This generates a lot of noise in the logs and might overshadow an actual 
exception.
   
   Thus in order to avoid the `AlreadyExistsException` in the info logs, we are 
performing an explicit check and using the Catalog APIs instead of the Spark 
SQL to create a namespace.



##########
spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/SparkTestBaseWithCatalog.java:
##########
@@ -86,7 +86,7 @@ public SparkTestBaseWithCatalog(String catalogName, String 
implementation, Map<S
 
     this.tableName = (catalogName.equals("spark_catalog") ? "" : catalogName + 
".") + "default.table";
 
-    sql("CREATE NAMESPACE IF NOT EXISTS default");
+    createNamespace(validationNamespaceCatalog, Namespace.of("default"));

Review Comment:
   Hi @rdblue - thank you for your comment.
   
   > Why is this needed? IF NOT EXISTS should check existence first, right?
   
   Yes - in a way it does perform the check.
   
   When "CREATE NAMESPACE IF NOT EXISTS default" SQL command is executed in 
Spark, Spark invokes hive.createDatabase command.
   
https://github.com/apache/spark/blob/89fdb8a6fb6a669c458891b3abeba236e64b1e89/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala#L574
   
   Hive client invokes internally invokes HMS API to create the database. If 
the DB already exists HMS throws AlreadyExistsException. When the ifNotExist 
flag is set to true, the Hive client simply ignores the exception.
   
https://github.com/apache/hive/blob/63326ff775206e59547b6b1332e25279e90ef5ee/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L608-L619
   
   The HMS logs this exception to STDERR and for iceberg tests since a 
standalone HMS is running in the same JVM as that of the test, these logs are 
part of the info output of the tests.
   
   This generates a lot of noise in the logs and might overshadow an actual 
exception.
   
   Thus in order to avoid the `AlreadyExistsException` in the info logs, we are 
performing an explicit check and using the Catalog APIs instead of the Spark 
SQL to create a namespace.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] sumeetgajjar commented on a diff in pull request #4942: [Spark][Test]: Check before creating default namespace to avoid noisy AlreadyExistsExceptions in test logs

Reply via email to