aokolnychyi commented on a change in pull request #2564:
URL: https://github.com/apache/iceberg/pull/2564#discussion_r629841356



##########
File path: core/src/main/java/org/apache/iceberg/MetadataLocationUtils.java
##########
@@ -58,17 +53,22 @@ private static void miscMetadataFiles(String 
metadataFileLocation, Set<String> m
     if (metadataFileLocation == null) {
       return;
     }
-    try {
-      TableMetadata metadata = TableMetadataParser.read(io, 
metadataFileLocation);
-      List<TableMetadata.MetadataLogEntry> metadataLogEntries = 
metadata.previousFiles();
-      List<String> previousMetadataFiles =
-          
metadataLogEntries.stream().map(TableMetadata.MetadataLogEntry::file).collect(Collectors.toList());
+    TableMetadata metadata = TableMetadataParser.read(io, 
metadataFileLocation);
+    List<TableMetadata.MetadataLogEntry> metadataLogEntries = 
metadata.previousFiles();
+    List<String> previousMetadataFiles =
+        metadataLogEntries.stream().map(TableMetadata.MetadataLogEntry::file)
+            .collect(Collectors.toList());
+    if (previousMetadataFiles.size() > 0) {
       metaFiles.addAll(previousMetadataFiles);
-      if (isRecursive && previousMetadataFiles.size() > 0) {
-        miscMetadataFiles(previousMetadataFiles.get(0), metaFiles, io, 
isRecursive);
+      if (isRecursive) {

Review comment:
       I think we can try to load previous metadata files but give up if we 
cannot do that. It should be okay to leave those files too. In case if Iceberg 
owns the location, we will drop the entire location after we clean up whatever 
we could reach.
   
   Listing may not be an option as the same location may contain files for 
multiple tables. In HiveCatalog, we have table UUID as part of the file name 
while we don't have that in HadoopCatalog for example. In addition, there may 
be quite some metadata files and listing can be expensive.
   
   If it is too complicated, we can drop the recursion too and I say we only 
clean up files currently referenced. Shouldn't be too bad.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to