[GitHub] [iceberg] rdblue commented on a change in pull request #3701: [SPARK] Make drop namespaces call respect CASCADE and IF EXISTS

GitBox Sat, 11 Dec 2021 21:59:53 -0800


rdblue commented on a change in pull request #3701:
URL: https://github.com/apache/iceberg/pull/3701#discussion_r767064388




##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.

Review comment:
       I don't think that this is a good place for a multi-line comment. 
Between `@Override` and the method seems odd. I think I would place it closer 
to the actual behavior. If we have a namespace catalog, then attempt to 
recursively delete.

##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.
   public boolean dropNamespace(String[] namespace) throws 
NoSuchNamespaceException {
     if (asNamespaceCatalog != null) {
+      Namespace asNamespace = Namespace.of(namespace);
+      boolean exists = namespaceExists(namespace);
+
+      // Spark only throws the catalyst version of `NoSuchNamespaceException` 
if the namespace
+      // does not exist AND the user did not specify `IF EXISTS` in their 
query.
+      //
+      // If the namespace does not exist, but listNamespaces didn't throw an 
exception,
+      // we know the user used IF EXISTS and can return false early.
+      List<Namespace> subNamespaces;
       try {
-        return asNamespaceCatalog.dropNamespace(Namespace.of(namespace));
+        subNamespaces = asNamespaceCatalog.listNamespaces(asNamespace);

Review comment:
       Couldn't this call the Spark version of `listNamespaces` and not need to 
wrap the exception? Plus, this is calling this `dropNamespace` recursively, 
which wants to use `String[]` so you wouldn't need to translate from 
`Namespace`.

##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.
   public boolean dropNamespace(String[] namespace) throws 
NoSuchNamespaceException {
     if (asNamespaceCatalog != null) {
+      Namespace asNamespace = Namespace.of(namespace);
+      boolean exists = namespaceExists(namespace);
+
+      // Spark only throws the catalyst version of `NoSuchNamespaceException` 
if the namespace
+      // does not exist AND the user did not specify `IF EXISTS` in their 
query.
+      //
+      // If the namespace does not exist, but listNamespaces didn't throw an 
exception,
+      // we know the user used IF EXISTS and can return false early.
+      List<Namespace> subNamespaces;
       try {
-        return asNamespaceCatalog.dropNamespace(Namespace.of(namespace));
+        subNamespaces = asNamespaceCatalog.listNamespaces(asNamespace);
       } catch (org.apache.iceberg.exceptions.NoSuchNamespaceException e) {
         throw new NoSuchNamespaceException(namespace);
       }
-    }
 
+      if (!exists && subNamespaces.size() == 0) {

Review comment:
       Is this possible? If the namespace doesn't exist, how would 
`subNamespaces` be populated? Wouldn't it throw `NoSuchNamespaceException` 
above? If so, what is the value of `exists`?

##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.
   public boolean dropNamespace(String[] namespace) throws 
NoSuchNamespaceException {
     if (asNamespaceCatalog != null) {
+      Namespace asNamespace = Namespace.of(namespace);
+      boolean exists = namespaceExists(namespace);
+
+      // Spark only throws the catalyst version of `NoSuchNamespaceException` 
if the namespace
+      // does not exist AND the user did not specify `IF EXISTS` in their 
query.
+      //
+      // If the namespace does not exist, but listNamespaces didn't throw an 
exception,
+      // we know the user used IF EXISTS and can return false early.
+      List<Namespace> subNamespaces;
       try {
-        return asNamespaceCatalog.dropNamespace(Namespace.of(namespace));
+        subNamespaces = asNamespaceCatalog.listNamespaces(asNamespace);
       } catch (org.apache.iceberg.exceptions.NoSuchNamespaceException e) {
         throw new NoSuchNamespaceException(namespace);
       }
-    }
 
+      if (!exists && subNamespaces.size() == 0) {
+        return false;
+      }
+
+      // Recursively drop namespaces under the requested `namespace`
+      // so that the base case will delete the tables and then the namespace 
of those tables
+      // if the user used CASCADE. If the user did not use CASCADE, Spark will 
return false
+      // as soon as it encounters a non-empty namespace.
+      for (Namespace ns : subNamespaces) {
+        try {
+          boolean didDrop = dropNamespace(ns.levels());
+          if (!didDrop) {
+            return false;

Review comment:
       I don't think this logic is quite correct. This should return `false` if 
this namespace didn't exist. If another inner namespace no longer exists, then 
that's okay. We can simply skip it.

##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.
   public boolean dropNamespace(String[] namespace) throws 
NoSuchNamespaceException {
     if (asNamespaceCatalog != null) {
+      Namespace asNamespace = Namespace.of(namespace);
+      boolean exists = namespaceExists(namespace);
+
+      // Spark only throws the catalyst version of `NoSuchNamespaceException` 
if the namespace
+      // does not exist AND the user did not specify `IF EXISTS` in their 
query.
+      //
+      // If the namespace does not exist, but listNamespaces didn't throw an 
exception,
+      // we know the user used IF EXISTS and can return false early.
+      List<Namespace> subNamespaces;
       try {
-        return asNamespaceCatalog.dropNamespace(Namespace.of(namespace));
+        subNamespaces = asNamespaceCatalog.listNamespaces(asNamespace);
       } catch (org.apache.iceberg.exceptions.NoSuchNamespaceException e) {
         throw new NoSuchNamespaceException(namespace);
       }
-    }
 
+      if (!exists && subNamespaces.size() == 0) {
+        return false;
+      }
+
+      // Recursively drop namespaces under the requested `namespace`
+      // so that the base case will delete the tables and then the namespace 
of those tables
+      // if the user used CASCADE. If the user did not use CASCADE, Spark will 
return false
+      // as soon as it encounters a non-empty namespace.
+      for (Namespace ns : subNamespaces) {
+        try {
+          boolean didDrop = dropNamespace(ns.levels());
+          if (!didDrop) {
+            return false;
+          }
+        } catch (NoSuchNamespaceException e) {
+          // Spark says this sub-namespace doesn't exist. This is unlikely to 
happen as we just
+          // got it from a listing, but it could have been concurrently 
removed.
+          // In either case, the result is the same.
+        }
+      }
+
+      // Base case
+      Arrays.stream(listTables(namespace)).forEach(this::dropTable);
+      return asNamespaceCatalog.dropNamespace(asNamespace);

Review comment:
       This shouldn't return the return value of `dropNamespace`. It should 
return `true` if any sub-namespace drop returned true or if any table drop 
returned true. This only returns false if this namespace already doesn't exist.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #3701: [SPARK] Make drop namespaces call respect CASCADE and IF EXISTS

Reply via email to