rdblue commented on a change in pull request #631: Add mechanism to expire old 
metadata versions
URL: https://github.com/apache/incubator-iceberg/pull/631#discussion_r346013219
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/TableMetadataParser.java
 ##########
 @@ -266,8 +278,20 @@ static TableMetadata fromJson(TableOperations ops, 
InputFile file, JsonNode node
       }
     }
 
+    SortedSet<MetadataLogEntry> metadataEntries =
+            
Sets.newTreeSet(Comparator.comparingLong(MetadataLogEntry::timestampMillis));
+    if (node.has(METADATA_LOG)) {
+      Iterator<JsonNode> logIterator = node.get(METADATA_LOG).elements();
+      while (logIterator.hasNext()) {
+        JsonNode entryNode = logIterator.next();
+        metadataEntries.add(new MetadataLogEntry(
+                JsonUtil.getLong(TIMESTAMP_MS, entryNode), 
JsonUtil.getString(METADATA_FILE, entryNode)));
+      }
+    }
+
     return new TableMetadata(ops, file, uuid, location,
         lastUpdatedMillis, lastAssignedColumnId, schema, defaultSpecId, specs, 
properties,
-        currentVersionId, snapshots, ImmutableList.copyOf(entries.iterator()));
+        currentVersionId, snapshots, ImmutableList.copyOf(entries.iterator()),
+        ImmutableList.copyOf(metadataEntries.iterator()), null);
 
 Review comment:
   I don't think that `TableMetadata` should change when it is serialized and 
deserialized. That's what happens if `TableMetadata` is used to track the old 
metadata locations that were removed.
   
   I'd prefer to change this so that only the previous metadata entries are 
tracked on `TableMetadata`, and then the `commit` method should delete entries 
in 
`Sets.newHashSet(baseMetadata.previousMetadataFiles()).removeAll(newMetadata.previousMetadataFiles())`.
 Does that make sense?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to