Re: [PR] [#4842] fix(hadoop-catalog): Improve the schema drop logic for fileset catalog [gravitino]

via GitHub Mon, 11 Nov 2024 02:02:55 -0800


jerryshao commented on code in PR #5521:
URL: https://github.com/apache/gravitino/pull/5521#discussion_r1836277928



##########
catalogs/catalog-hadoop/src/main/java/org/apache/gravitino/catalog/hadoop/HadoopCatalogOperations.java:
##########
@@ -581,31 +583,71 @@ public Schema alterSchema(NameIdentifier ident, 
SchemaChange... changes)
   @Override
   public boolean dropSchema(NameIdentifier ident, boolean cascade) throws 
NonEmptySchemaException {
     try {
+      Namespace filesetNs =
+          NamespaceUtil.ofFileset(
+              ident.namespace().level(0), // metalake name
+              ident.namespace().level(1), // catalog name
+              ident.name() // schema name
+              );
+
+      List<FilesetEntity> filesets =
+          store.list(filesetNs, FilesetEntity.class, 
Entity.EntityType.FILESET);
+      if (!filesets.isEmpty() && !cascade) {
+        throw new NonEmptySchemaException("Schema %s is not empty", ident);
+      }
+
+      // Delete all the managed filesets no matter whether the storage 
location is under the
+      // schema path or not.
+      // The reason why we delete the managed fileset's storage location one 
by one is because we
+      // may mis-delete the storage location of the external fileset if it 
happens to be under
+      // the schema path.
+      filesets.stream()
+          .filter(f -> f.filesetType() == Fileset.Type.MANAGED)
+          .forEach(
+              f -> {
+                try {
+                  Path filesetPath = new Path(f.storageLocation());

Review Comment:
   Please see the comments. If we blindly delete the schema path first, we will 
potentially delete some unmanaged files/directories under this path. For 
example, if the external fileset's path is happened to be here, we may delete 
this path mistakenly. Also if we have some other files under this schema path, 
like table, we will also delete it somehow. So that's why I delete the fileset 
path one by one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [#4842] fix(hadoop-catalog): Improve the schema drop logic for fileset catalog [gravitino]

Reply via email to