karuppayya commented on a change in pull request #2564:
URL: https://github.com/apache/iceberg/pull/2564#discussion_r629751306
##########
File path: core/src/main/java/org/apache/iceberg/MetadataLocationUtils.java
##########
@@ -58,17 +53,22 @@ private static void miscMetadataFiles(String
metadataFileLocation, Set<String> m
if (metadataFileLocation == null) {
return;
}
- try {
- TableMetadata metadata = TableMetadataParser.read(io,
metadataFileLocation);
- List<TableMetadata.MetadataLogEntry> metadataLogEntries =
metadata.previousFiles();
- List<String> previousMetadataFiles =
-
metadataLogEntries.stream().map(TableMetadata.MetadataLogEntry::file).collect(Collectors.toList());
+ TableMetadata metadata = TableMetadataParser.read(io,
metadataFileLocation);
+ List<TableMetadata.MetadataLogEntry> metadataLogEntries =
metadata.previousFiles();
+ List<String> previousMetadataFiles =
+ metadataLogEntries.stream().map(TableMetadata.MetadataLogEntry::file)
+ .collect(Collectors.toList());
+ if (previousMetadataFiles.size() > 0) {
metaFiles.addAll(previousMetadataFiles);
- if (isRecursive && previousMetadataFiles.size() > 0) {
- miscMetadataFiles(previousMetadataFiles.get(0), metaFiles, io,
isRecursive);
+ if (isRecursive) {
Review comment:
We need to find the first existent file, like in my previous logic.
But there will be issues with that as well. There can be unreachable files
like in this test case.
https://github.com/apache/iceberg/pull/2564/files#diff-efb7c769faae3f19c11382d2dc4c53a69074265a87f7291f8430ad91410d48e2R141
Another case which might not work is updating
`org.apache.iceberg.TableProperties#METADATA_PREVIOUS_VERSIONS_MAX` for a
table,
```
say v101 has reference to 1-100
Update org.apache.iceberg.TableProperties#METADATA_PREVIOUS_VERSIONS_MAX to 1
v102 will have reference to v101
v103 will have reference to v102(Let this be the latest version)
```
If v 102 is removed(say as part of RemoveOrphanFiles), we would have lost
all reference between 1 and 101.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]