[GitHub] [iceberg] kbendick commented on a change in pull request #3701: [SPARK] Make drop namespaces call respect CASCADE and IF EXISTS

GitBox Sat, 11 Dec 2021 22:04:47 -0800


kbendick commented on a change in pull request #3701:
URL: https://github.com/apache/iceberg/pull/3701#discussion_r767079196




##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.

Review comment:
       Ok. Will move that.

##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.
   public boolean dropNamespace(String[] namespace) throws 
NoSuchNamespaceException {
     if (asNamespaceCatalog != null) {
+      Namespace asNamespace = Namespace.of(namespace);
+      boolean exists = namespaceExists(namespace);
+
+      // Spark only throws the catalyst version of `NoSuchNamespaceException` 
if the namespace
+      // does not exist AND the user did not specify `IF EXISTS` in their 
query.
+      //
+      // If the namespace does not exist, but listNamespaces didn't throw an 
exception,
+      // we know the user used IF EXISTS and can return false early.
+      List<Namespace> subNamespaces;
       try {
-        return asNamespaceCatalog.dropNamespace(Namespace.of(namespace));
+        subNamespaces = asNamespaceCatalog.listNamespaces(asNamespace);

Review comment:
       listNamespaces also calls the `asNamespaceCatalog`. Happy to update it 
but it's not a huge win.
   
   ```java
     @Override
     public String[][] listNamespaces() {
       if (asNamespaceCatalog != null) {
         return asNamespaceCatalog.listNamespaces().stream()
             .map(Namespace::levels)
             .toArray(String[][]::new);
       }
   
       return new String[0][];
     }
   ```

##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.
   public boolean dropNamespace(String[] namespace) throws 
NoSuchNamespaceException {
     if (asNamespaceCatalog != null) {
+      Namespace asNamespace = Namespace.of(namespace);
+      boolean exists = namespaceExists(namespace);
+
+      // Spark only throws the catalyst version of `NoSuchNamespaceException` 
if the namespace
+      // does not exist AND the user did not specify `IF EXISTS` in their 
query.
+      //
+      // If the namespace does not exist, but listNamespaces didn't throw an 
exception,
+      // we know the user used IF EXISTS and can return false early.
+      List<Namespace> subNamespaces;
       try {
-        return asNamespaceCatalog.dropNamespace(Namespace.of(namespace));
+        subNamespaces = asNamespaceCatalog.listNamespaces(asNamespace);
       } catch (org.apache.iceberg.exceptions.NoSuchNamespaceException e) {
         throw new NoSuchNamespaceException(namespace);
       }
-    }
 
+      if (!exists && subNamespaces.size() == 0) {

Review comment:
       If a user calls `DROP NAMESPACE IF EXISTS ns` on a namespace that does 
not exist, it will not throw `NoSuchNamespaceException`.
   
   That's why I check if it exists first. in that case, subNamespaces will be 
`new Stringp[0][]` with the new set up.

##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.
   public boolean dropNamespace(String[] namespace) throws 
NoSuchNamespaceException {
     if (asNamespaceCatalog != null) {
+      Namespace asNamespace = Namespace.of(namespace);
+      boolean exists = namespaceExists(namespace);
+
+      // Spark only throws the catalyst version of `NoSuchNamespaceException` 
if the namespace
+      // does not exist AND the user did not specify `IF EXISTS` in their 
query.
+      //
+      // If the namespace does not exist, but listNamespaces didn't throw an 
exception,
+      // we know the user used IF EXISTS and can return false early.
+      List<Namespace> subNamespaces;
       try {
-        return asNamespaceCatalog.dropNamespace(Namespace.of(namespace));
+        subNamespaces = asNamespaceCatalog.listNamespaces(asNamespace);
       } catch (org.apache.iceberg.exceptions.NoSuchNamespaceException e) {
         throw new NoSuchNamespaceException(namespace);
       }
-    }
 
+      if (!exists && subNamespaces.size() == 0) {
+        return false;
+      }
+
+      // Recursively drop namespaces under the requested `namespace`
+      // so that the base case will delete the tables and then the namespace 
of those tables
+      // if the user used CASCADE. If the user did not use CASCADE, Spark will 
return false
+      // as soon as it encounters a non-empty namespace.
+      for (Namespace ns : subNamespaces) {
+        try {
+          boolean didDrop = dropNamespace(ns.levels());
+          if (!didDrop) {
+            return false;

Review comment:
       This is just to exit early (in case there are a large number of 
entries). I can remove it though.

##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.
   public boolean dropNamespace(String[] namespace) throws 
NoSuchNamespaceException {
     if (asNamespaceCatalog != null) {
+      Namespace asNamespace = Namespace.of(namespace);
+      boolean exists = namespaceExists(namespace);
+
+      // Spark only throws the catalyst version of `NoSuchNamespaceException` 
if the namespace
+      // does not exist AND the user did not specify `IF EXISTS` in their 
query.
+      //
+      // If the namespace does not exist, but listNamespaces didn't throw an 
exception,
+      // we know the user used IF EXISTS and can return false early.
+      List<Namespace> subNamespaces;
       try {
-        return asNamespaceCatalog.dropNamespace(Namespace.of(namespace));
+        subNamespaces = asNamespaceCatalog.listNamespaces(asNamespace);

Review comment:
       Updated.

##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.
   public boolean dropNamespace(String[] namespace) throws 
NoSuchNamespaceException {
     if (asNamespaceCatalog != null) {
+      Namespace asNamespace = Namespace.of(namespace);
+      boolean exists = namespaceExists(namespace);
+
+      // Spark only throws the catalyst version of `NoSuchNamespaceException` 
if the namespace
+      // does not exist AND the user did not specify `IF EXISTS` in their 
query.
+      //
+      // If the namespace does not exist, but listNamespaces didn't throw an 
exception,
+      // we know the user used IF EXISTS and can return false early.
+      List<Namespace> subNamespaces;
       try {
-        return asNamespaceCatalog.dropNamespace(Namespace.of(namespace));
+        subNamespaces = asNamespaceCatalog.listNamespaces(asNamespace);
       } catch (org.apache.iceberg.exceptions.NoSuchNamespaceException e) {
         throw new NoSuchNamespaceException(namespace);
       }
-    }
 
+      if (!exists && subNamespaces.size() == 0) {
+        return false;
+      }
+
+      // Recursively drop namespaces under the requested `namespace`
+      // so that the base case will delete the tables and then the namespace 
of those tables
+      // if the user used CASCADE. If the user did not use CASCADE, Spark will 
return false
+      // as soon as it encounters a non-empty namespace.
+      for (Namespace ns : subNamespaces) {
+        try {
+          boolean didDrop = dropNamespace(ns.levels());
+          if (!didDrop) {
+            return false;

Review comment:
       Basically, if this returns false, it means that the user called with `IF 
EXISTS`.  I have a few tests for it.
   
   once it happens once, we know that they'll all be that way (because this is 
the only circumstance that it will return false from instead of throwing - when 
the user specified `IF EXISTS`).
   
   I can remove the early return and allow it to iterate over all of them if 
we'd like.

##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.
   public boolean dropNamespace(String[] namespace) throws 
NoSuchNamespaceException {
     if (asNamespaceCatalog != null) {
+      Namespace asNamespace = Namespace.of(namespace);
+      boolean exists = namespaceExists(namespace);
+
+      // Spark only throws the catalyst version of `NoSuchNamespaceException` 
if the namespace
+      // does not exist AND the user did not specify `IF EXISTS` in their 
query.
+      //
+      // If the namespace does not exist, but listNamespaces didn't throw an 
exception,
+      // we know the user used IF EXISTS and can return false early.
+      List<Namespace> subNamespaces;
       try {
-        return asNamespaceCatalog.dropNamespace(Namespace.of(namespace));
+        subNamespaces = asNamespaceCatalog.listNamespaces(asNamespace);
       } catch (org.apache.iceberg.exceptions.NoSuchNamespaceException e) {
         throw new NoSuchNamespaceException(namespace);
       }
-    }
 
+      if (!exists && subNamespaces.size() == 0) {
+        return false;
+      }
+
+      // Recursively drop namespaces under the requested `namespace`
+      // so that the base case will delete the tables and then the namespace 
of those tables
+      // if the user used CASCADE. If the user did not use CASCADE, Spark will 
return false
+      // as soon as it encounters a non-empty namespace.
+      for (Namespace ns : subNamespaces) {
+        try {
+          boolean didDrop = dropNamespace(ns.levels());
+          if (!didDrop) {
+            return false;
+          }
+        } catch (NoSuchNamespaceException e) {
+          // Spark says this sub-namespace doesn't exist. This is unlikely to 
happen as we just
+          // got it from a listing, but it could have been concurrently 
removed.
+          // In either case, the result is the same.
+        }
+      }
+
+      // Base case
+      Arrays.stream(listTables(namespace)).forEach(this::dropTable);
+      return asNamespaceCatalog.dropNamespace(asNamespace);

Review comment:
       That's somewhat equivalent in this case, based on the way Spark 
implements it.
   
   There's no situation in which some tables are dropped and not others. We 
either:
   - dropped them all via `CASCADE`, in which case this final `dropNamespace` 
result will be true
   - droppped nothing, and we got an excepton which already bubbled up because 
of a `NosuchNamespaceException`
   - dropped nothing, and this returns false becuse the user used `IF EXISTS` 
and the namespace didn't exist.

##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java
##########
@@ -369,15 +370,53 @@ public void alterNamespace(String[] namespace, 
NamespaceChange... changes) throw
   }
 
   @Override
+  // Spark assumes that catalogs CASCADE by default. So we have to eagerly
+  // attempt to drop namespaces and tables, but the CASCADE keyword is still
+  // required to actually drop tables and namespaces as Spark will error out
+  // if any of the recursive deletes are non-empty and the user didn't specify
+  // cascades in their query.
   public boolean dropNamespace(String[] namespace) throws 
NoSuchNamespaceException {
     if (asNamespaceCatalog != null) {
+      Namespace asNamespace = Namespace.of(namespace);
+      boolean exists = namespaceExists(namespace);
+
+      // Spark only throws the catalyst version of `NoSuchNamespaceException` 
if the namespace
+      // does not exist AND the user did not specify `IF EXISTS` in their 
query.
+      //
+      // If the namespace does not exist, but listNamespaces didn't throw an 
exception,
+      // we know the user used IF EXISTS and can return false early.
+      List<Namespace> subNamespaces;
       try {
-        return asNamespaceCatalog.dropNamespace(Namespace.of(namespace));
+        subNamespaces = asNamespaceCatalog.listNamespaces(asNamespace);
       } catch (org.apache.iceberg.exceptions.NoSuchNamespaceException e) {
         throw new NoSuchNamespaceException(namespace);
       }
-    }
 
+      if (!exists && subNamespaces.size() == 0) {
+        return false;
+      }
+
+      // Recursively drop namespaces under the requested `namespace`
+      // so that the base case will delete the tables and then the namespace 
of those tables
+      // if the user used CASCADE. If the user did not use CASCADE, Spark will 
return false
+      // as soon as it encounters a non-empty namespace.
+      for (Namespace ns : subNamespaces) {
+        try {
+          boolean didDrop = dropNamespace(ns.levels());
+          if (!didDrop) {
+            return false;
+          }
+        } catch (NoSuchNamespaceException e) {
+          // Spark says this sub-namespace doesn't exist. This is unlikely to 
happen as we just
+          // got it from a listing, but it could have been concurrently 
removed.
+          // In either case, the result is the same.
+        }
+      }
+
+      // Base case
+      Arrays.stream(listTables(namespace)).forEach(this::dropTable);
+      return asNamespaceCatalog.dropNamespace(asNamespace);

Review comment:
       That's somewhat equivalent in this case, based on the way Spark 
implements it.
   
   There's no situation in which some tables are dropped and not others. We 
either:
   - dropped them all via `CASCADE`, in which case this final `dropNamespace` 
result will be true
   - droppped nothing, and we got an excepton which already bubbled up because 
of a `NosuchNamespaceException` or an exception from it being non-empty (which 
we're handling via recursive dropping as Spark will guard based on the presence 
of CASCADE or not).
   - dropped nothing, and this returns false becuse the user used `IF EXISTS` 
and the namespace didn't exist.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on a change in pull request #3701: [SPARK] Make drop namespaces call respect CASCADE and IF EXISTS

Reply via email to